Reading a file into a dataframe using PySpark in Databricks

The very first thing we need is a file. For this demo, this is our sample file : 

A comma-separated file with 3 columns and 3 rows.

  • So now that we have that, the first step would be to actually upload the file over to databricks.
  • Get the cluster running.
  • Go to Data tab
Click on "Create Table"



Drag the file to the specified space.


  • At this point, the file will be uploaded to the DBFS storage. After this, we can work with the file.
Copy the location of the file. (Here: /FileStore/tables/SampleData-1.csv)



  • Now, let's get back to the notebook to work with the file. 
  • Creating a dataframe from the file
dataframe_var=spark.read.option("header","true").csv('/FileStore/tables/SampleData-1.csv') 
  • Display the dataframe  

  • We can see that the file data is successfully displayed in the dataframe.


We will take a look at how to load files to tables in the next blog.


Comments

Popular posts from this blog

Creating tables from dataframes