Posts

Creating tables from dataframes

 Continuing from our last post: Reading file into dataframe Let's create a table from the dataframe: Dataframe.write.saveAsTable("Database.tablename") We need to remember that we are creating a managed table with the above syntax, and to create an unmanaged table we can add the path option.   Managed Table Unmanaged Table Now we can Query the tables using SQL. Reference for the difference between managed and unmanaged tables: Click here  

Reading a file into a dataframe using PySpark in Databricks

Image
The very first thing we need is a file. For this demo, this is our sample file :  A comma-separated file with 3 columns and 3 rows. So now that we have that, the first step would be to actually upload the file over to databricks. Get the cluster running. Go to Data tab Click on "Create Table" Drag the file to the specified space. At this point, the file will be uploaded to the DBFS storage. After this, we can work with the file. Copy the location of the file. (Here: /FileStore/tables/SampleData-1.csv) Now, let's get back to the notebook to work with the file.  Creating a dataframe from the file dataframe_var=spark.read.option("header","true").csv('/FileStore/tables/SampleData-1.csv')   Display the dataframe    We can see that the file data is successfully displayed in the dataframe. We will take a look at how to load files to tables in the next blog.

Working on Databricks Community Edition

Image
  After logging in, the first thing to do is to create a cluster (If a cluster has already been created, proceed to the next step). For cluster creation, click the “ Clusters ” menu from the side ribbon, or use the “ New Cluster ” from “ Common Tasks ”: Enter the cluster name and keep the rest of the settings as it is , for now, and click “ Create Cluster ”. This will create the cluster and spin it up. Starting a cluster can take some time, so be patient.             The next step is to create a notebook where the magic happens.   For this, Go to Workspace > YourUsername > Create (From the Drop-down menu) > Notebook. Give the notebook a name, select the default language (Don’t worry, any of the four languages can be used inside the notebook regardless of the default language), and choose the newly created cluster: Once that’s done, it’ll look something like this (Default Notebook theme can be Light, can be changed from View): ...

Create Databricks Community Edition Account

Image
How to create Databricks Community Edition account? Visit the site: Try Databricks . Fill up Required information.

Cursor And It's Life Cycle

Image
What is Cursor in SQL Server? Cursor can be seen as a way to implement looping to SQL , which isn't a programming language, so much so it is a transactional language. SQL Cursor Life Cycle The following steps are involved in a SQL cursor life cycle.  1. Declaring Cursor At this stage, we define the SQL statement for the cursor, in turn defining the data set on which we are supposed to loop. 2. Opening Cursor Once we know the dataset, it's time to actually fetch that dataset for our cursor to run through. 3. Fetching Cursor This is the main step which is being looped over, in a sense. It fetches one row at a time from our cursor so that operations can be performed on the data fetched. 4. Closing Cursor The cursor should be closed explicitly after data manipulation. 5. Deallocating Cursor Cursors should be deallocated to delete cursor definition and release all the system resources associated with the cursor. In upcoming blog we will have an example on CURSOR.

Understanding: Cloud Computing, Azure Subscriptions, Azure Resource & Azure Resource Groups

What is Cloud Computing ? On a day-to-day basis, we are using a lot of services without even realizing that it can be called one. From the databases used in our websites to the CPU and memory used for training our fresh Machine learning model, we are surrounded by the services, which till now, we were managing on our own. But with the spread of high-speed internet and tech giants realizing their data centers, apart from hosting a one-day mega online sale every year(we know it’s you Amazon!), can also be used to provide services to everyone else, and thus, the data centers will pay for themselves. This gave rise to what we know today as Cloud Computing. Providing compute (CPU/Memory), storage, AI, and several other services over the Internet, and the best part being, we don’t have to worry about maintenance or scalability either. The actual data centers are still located somewhere, but since we are receiving the services we require over the Internet, so it’s termed Cloud Computing...