Reading a file into a dataframe using PySpark in Databricks
The very first thing we need is a file. For this demo, this is our sample file :
A comma-separated file with 3 columns and 3 rows.
- So now that we have that, the first step would be to actually upload the file over to databricks.
- Get the cluster running.
- Go to Data tab
Click on "Create Table" |
Drag the file to the specified space. |
- At this point, the file will be uploaded to the DBFS storage. After this, we can work with the file.
Copy the location of the file. (Here: /FileStore/tables/SampleData-1.csv) |
- Now, let's get back to the notebook to work with the file.
- Creating a dataframe from the file
- Display the dataframe
- We can see that the file data is successfully displayed in the dataframe.
We will take a look at how to load files to tables in the next blog.
Comments
Post a Comment