Automating Databricks Environment | How to use Databricks Rest API | Databricks Spark Automation
Summary
TLDRIn this session, the focus is on automation tools provided by Databricks. The discussion covers the transition from manual tasks to automated processes, particularly for deploying projects into production environments. The presenter introduces three approaches for automation: Databricks REST API, Databricks SDK, and Databricks CLI. A detailed walkthrough of using the REST API to create and manage jobs is provided, including a live demonstration of automating job creation and execution within a Databricks workspace. The session aims to equip viewers with the knowledge to automate various tasks using these tools, with a comprehensive example set to be explored in the Capstone project.
Takeaways
- 🔧 The session focuses on automation tools provided by Databricks, which are crucial for automating tasks in a Databricks workspace.
- 🛠️ Databricks offers three main approaches for automation: REST API, Databricks SDK, and Databricks CLI, each suitable for different programming languages and use cases.
- 📚 The REST API is the most frequently used method, allowing users to perform almost any action programmatically that can be done through the Databricks UI.
- 🔗 The REST API documentation is platform-agnostic and provides a comprehensive list of endpoints for various Databricks services.
- 💻 The Databricks SDK provides language-specific libraries, such as Python, Scala, and Java, for automation tasks.
- 📝 The Databricks CLI is a command-line tool that enables users to perform UI actions through command-line commands, suitable for shell scripting.
- 🔄 The session includes a live demo of using the REST API to create and manage jobs in Databricks, showcasing the process from job creation to monitoring job status.
- 🔑 Authentication is a critical aspect of using Databricks REST API, requiring an access token that can be generated from the user's settings in the Databricks UI.
- 🔍 The process of creating a job via REST API involves defining a JSON payload that includes job details such as name, tasks, and cluster configurations.
- 🔎 The script provided in the session demonstrates how to automate job creation, triggering, and monitoring, which is part of a larger automation strategy in Databricks environments.
Q & A
What are the automation tools offered by Databricks?
-Databricks offers three approaches for automation: Databricks REST API, Databricks SDK, and Databricks CLI.
How does the Databricks REST API work?
-The Databricks REST API allows you to perform actions programmatically using HTTP requests. It's a universal tool that can be used from any language that supports calling REST-based APIs.
What is the purpose of the Databricks SDK?
-The Databricks SDK provides language-specific libraries for Python, Scala, and Java, which can be used to interact with Databricks services in a more straightforward way than using raw REST API calls.
What can you do with the Databricks CLI?
-The Databricks CLI is a command-line tool that allows you to perform operations that you can do through the UI, making it useful for scripting and automation tasks.
How can you automate the creation of a job in Databricks?
-You can automate the creation of a job in Databricks by using the 'jobs create' REST API endpoint, which requires a JSON payload that defines the job configuration.
What is the role of the 'jobs run' API in Databricks automation?
-The 'jobs run' API is used to trigger the execution of a job in Databricks. It takes a job ID and other optional parameters to start the job.
How can you monitor the status of a job in Databricks using the REST API?
-You can monitor the status of a job using the 'jobs get' API, which provides details about the job, including its current lifecycle state.
What is the significance of the job ID and run ID in Databricks automation?
-The job ID uniquely identifies a job in Databricks, while the run ID identifies a specific execution of that job. These IDs are crucial for tracking and managing jobs and their runs programmatically.
How can you automate the deployment of a Databricks project to a production environment?
-You can automate the deployment of a Databricks project by using CI/CD pipelines that trigger on code commits, automatically build and test the code, and then deploy it to the Databricks workspace environment.
What is the process of generating a JSON payload for job creation in Databricks?
-The JSON payload for job creation can be generated by manually defining the job through the UI, viewing the JSON, and copying it for use in automation scripts, or by constructing it programmatically based on the job's requirements.
How does the speaker demonstrate the use of Databricks REST API in the provided transcript?
-The speaker demonstrates the use of Databricks REST API by showing how to create a job, trigger it, and monitor its status using Python code that makes HTTP requests to the Databricks REST API endpoints.
Outlines
🤖 Introduction to Databricks Automation Tools
The speaker begins by introducing the session's focus on automation tools provided by Databricks. They discuss the various manual tasks that can be automated in Databricks, such as creating notebooks, clusters, and defining jobs with workflows. The speaker emphasizes the importance of automation in production environments and mentions the use of CI/CD pipelines to automate build processes, integration testing, and deployment. They introduce three main approaches for automation in Databricks: REST API, Databricks SDK, and Databricks CLI. The REST API is highlighted as the most frequently used method, with the documentation being a universal resource across all platforms. The speaker promises to provide a demo of these tools and their capabilities.
🔗 Exploring Databricks REST API
The speaker dives into the details of the Databricks REST API, explaining how it allows for automation of tasks within the Databricks workspace. They mention the various areas covered by the API, such as workspace, compute, workflows, and more. The speaker provides an example of how to use the Jobs API to list and create jobs, emphasizing the use of JSON for request and response payloads. They explain the structure of API calls, including the use of GET and POST methods, and provide a high-level overview of how to interact with the API. The speaker also demonstrates how to find specific API documentation and gives a live example of creating a cluster in Databricks.
📚 Automating Job Creation with REST API
The speaker illustrates how to automate the creation of a Databricks job using the REST API. They discuss the need for an integration test for a streaming application and how to create an automation test case for it. The speaker outlines the process of defining job parameters, such as the job name, schedule, tasks, and cluster configuration, within a JSON payload. They demonstrate how to use Python's 'requests' library to make a POST request to the Databricks REST API to create a job. The speaker also shows how to extract the job ID from the API response, which is crucial for further automation tasks.
📝 Demonstrating Job Creation and Execution
The speaker provides a practical demonstration of creating and executing a Databricks job using the REST API. They walk through the process of defining a job in the UI and then showing how to replicate that definition using the API. The speaker shows how to manually create a job in the Databricks UI, detailing the steps involved in setting up a job with a notebook task, job cluster, and parameters. They then explain how to view the JSON definition of the job, which can be used for automation. The speaker also demonstrates how to trigger a job run using the REST API and how to monitor the job's status until it starts execution.
🔄 Monitoring Job Status and Running Test Cases
The speaker continues the demonstration by showing how to monitor the status of a job run using the REST API. They explain the use of a while loop to check the job's lifecycle state until it transitions from pending to running. Once the job starts, the speaker outlines the steps for running test cases, which include loading historical data, validating data across different layers of a data architecture, and producing additional data batches. They emphasize the importance of waiting for the job to start before running test cases to ensure data availability. The speaker also mentions the use of additional REST APIs for job cancellation and deletion after testing is complete.
🔄 Practical Implementation and Future Automation Topics
The speaker concludes the demonstration by running the automation script to create and execute a job, monitor its status, and run test cases. They successfully create a job and trigger it, demonstrating the practical application of the REST API for automation in Databricks. The speaker also mentions the upcoming topics of Databricks SDK and Databricks CLI for automation, indicating that these will be covered in future sessions. They invite questions from the audience while the job is being executed, highlighting the interactive nature of the session.
Mindmap
Keywords
💡Databricks
💡Automation
💡REST API
💡SDK
💡CLI
💡Notebooks
💡Workflows
💡Jobs
💡Clusters
💡Integration Testing
Highlights
Introduction to automation tools offered by Databricks.
Overview of automating tasks in Databricks workspace such as creating notebooks, clusters, and defining jobs.
The importance of automating deployment to production environments.
Building CI/CD pipelines for automated builds and deployments in Databricks.
Automating integration tests using Databricks' automation tools.
Three approaches for automating work in Databricks: REST API, Databricks SDK, and Databricks CLI.
Explanation of Databricks REST API and its documentation.
How to use Databricks REST API for creating and managing jobs.
Demonstration of using Python to interact with Databricks REST API for job creation.
Details on creating a JSON payload for job creation using REST API.
Using REST API to trigger a job and obtain a run ID.
Monitoring job status using REST API to ensure successful start and execution.
Automating test cases and validation checks in a Databricks notebook.
Process of cleaning up after tests are completed using REST API to cancel and delete jobs.
Integration of REST API usage in a full-fledged Capstone project for end-to-end automation.
Introduction to Databricks SDK as an alternative approach for automation.
Brief on Databricks CLI as a command-line tool for automation.
Invitation for questions while waiting for a job cluster to start.
Transcripts
okay so in today's session I want to
talk about uh automation tools offered
by data braks
so what are the things that we want to
automate basically uh you are you have
been learning datab brakes you know that
we can connect to the datab work space
it gives us a UI browser based UI and in
the datab work space we can create
notebooks we can write code there then
uh we can execute those notebooks we can
create clusters we can attach notebooks
to the cluster we can create that uh run
that Notebook on the cluster we can
Define jobs using workflows and then we
can schedule the jobs uh to
automatically trigger at certain time or
maybe we can manually go and Trigger the
jobs all that you already learned how to
do using uh the datab Bri UI but in uh
projects everything is not done manually
right at the end of the day when your
project is complete you want to deploy
it in the production uh environment
there are lot of things that you want to
uh automate uh or uh write code uh for
doing things you using the code right
for
example U you might want to build a cicd
pipeline for your project and automate
few things using the cicd pipeline like
as soon as you commit your code in your
code repository it should automatically
trigger a build when we say build then
uh your pipeline should automatically
pull code from latest code from the
repository execute uh uh test cases unit
test cases package everything and then
deploy all your notebooks and pack uh
other uh type of code files that you
have created to the datab brick work
space environment for your production
environment that's one kind of
automation then maybe you have written
um automation uh integration test maybe
you have written a notebook where uh you
have a code for doing integration
testing so you might want to
automatically trigger that uh
integration test from your cicd pipeline
itself right so what you want to do is
create a job cluster uh and run your
integration test as a job on the cluster
right and once that integration test is
uh executed it is passed everything is
passed run a cleanup is script all those
things you want to do using uh maybe
cicd pipeline or for different other
various reasons you might want to write
code for creating a job in your uh uh
data briak uh production environment you
don't want uh to uh want someone to go
and manually create the uh job right so
how do we automate that what kind of
tools and what kind of capabilities
datab braks offers us for the automation
that's the topic for uh the today so
datab BR offers three approaches for
automating your work uh first approach
and the most frequently used approach is
datab Bri rest API uh documentation for
rest API is you can find it here I'll
show you and then next thing is datab
SDK so rest API is like rest based API
you can use these apis to
uh uh from any language that supports
calling a rest based API mostly all the
languages even python Java Scala every
language supports calling rest based API
so these are like Universal from any
language you learn only rest API working
with rest API and you can use the same
from any language then databas also
offers a databas SDK which are language
specific sdks so they have SDK for
python they have SDK for Escala they
have SDK for uh Java so the second
approach is SDK less used but you have
that option and this third approach is
datab CLI so it's a command line tool
and you can use commands from the
command line tool to do almost
everything that you can do using the UI
right so datab CLI is another tool for
automation so if we want to use datab
CLI then most likely we will be writing
uh shell scripts to call different uh
CLI commands and to do whatever we want
to do and if you want to use rest API
then most likely we will be writing
python code for doing the automation so
I'll give you a quick demo of all these
three approaches uh a small small uh
examples I'll show how to use rest API
how to use datab SDK and how to use
datab CLI a more integrated and more
elaborate example is uh given in your
Capstone project where we we are using
database rest API for automating few
things and we are U using datab Bri CLI
for building in the entire automated uh
devops pipeline so you will get a
full-fledged example there uh but today
we want to learn how to use these tools
and what we can do uh with these tools
so let's uh close the slides and then we
will go to the browser and I'll uh help
you with the documentation link so you
can refer because rest API is huge so
you go to datab Bri rest API
documentation uh this is common
documentation uh across all platforms so
you are working in azour or in AWS or in
Google cloud or whatever rest API is
same for every platform documentation is
also same so
if you look at the rest API it is uh
broken down into uh different um uh
areas so you have datab brick workpace
rest API which allows you to do
everything in the work space write code
to do everything that you want to do in
the working space like you can create
grid credentials you can do repository
operations you can um do Secret
operations uh I mean you can save uh
credentials and create uh databas
secrets and inside the work space you
can do uh um get workpace object
permissions get work space uh object
permission lbel delete workpace object
create a directory everything and
anything that you do using UI everything
there is a rest API for everything right
so and for compute uh related activities
like cluster creation cluster policy
cluster pools all that you can handle
using uh compute based rest apis then
you have rest API for workflows for
Delta DLT for dbfs for machine learning
for uh realtime serving for access
management for data break SQL for Unity
catalog for Delta sharing for other
tokens and all that everything
everything that you can do through the
UI you can do the do it through the rest
API so uh and these rest apis let's say
let's come to the um jobs API where is
it so jobs API jobs API is one of the
rest apis uh which allows you to uh work
with data bricks jobs right workflow
jobs so you can list the job and this is
the API for this uh API is this API 2.1
jobs list 2.1 is the API version they
have multiple versions so latest one is
2.1 so uh and and this is the API URL
and it is a get API you if you know rest
little bit about rest API there are some
get API there are some post apis so uh
you you get to know what is the rest API
for job list maybe you have a API for
creating a new job so this is the API
for uh creating a job create API and
this is a post API and if you look look
at details uh you have documentation for
how to use it but at high level every
API works in the same way for making a
call to that API you have to provide
some
input that input is a Json message uh
that we call a request input and once
API is executed it will give you a
response response is also a Json output
and that's how every API works I will
show you demo for this and in the
documentation you can see uh you have
sample here so a typical request for
create API looks like this for create
API jobs create API will create a job in
your uh data frame uh in your databas
work space and for creating a job you
have to specify a lot of things what is
the job name uh how what is the schedule
for the job what are the different tasks
in that job right so all that you can
specify using Json you don't have to
manually write this Json uh we will see
how to generate this Json but uh for
creating a job input is a Json message
that tells the API about the job
definition and in the response is simple
response is is also a Json response
which tells you the job ID so you are
telling you are using this rest API to
create the job and once job is created
it will return you the job ID which you
can
fetch and uh uh at the bottom you can
see that you um have uh 200 is the
success response so if your response is
200 uh you you got a success your job is
successful and it will also give you a
job ID it might give you some other uh
HTTP codes so if you know programming or
working with rest API in any uh
reference same concept applies here so
now you know where to look for different
kind of apis and details of the inputs
and uh outputs uh because this is an
exhaustive list now let me show you one
demo how to use it so we'll go to our um
azour account and in azour I already
have
one I already have one workpace which I
created so let's go to the workpace and
we will see now
uh one example
there so this is my workpace uh for
doing anything I'll need a cluster so
let me create a cluster here uh create
compute uh let's create a single node
cluster without Photon terminate it
after maybe 60 minutes right DB is 0.75
so that's cheap and cluster is creating
it will get created in a few minutes now
uh I have this notebook I stream test
notebook uh where um I'm doing some work
so and using rest API for it so let me
explain what what this notebook is so
let's assume I created an application
right and for uh once that application
is done I also want to create a
integration test case for that
application right so I created an
integration test case and my application
is a streaming application so it's a
stream processing real time stream
processing application which reads data
from some source and then I have a three
layer architecture implemented bronze
layer and silver layer and then gold
layer so for bronze layer silver layer
gold layer there are so many jobs that
I've defined so many processes for
bronze layer I have defined three four
processes which will read data from a
landing zone or from some sources
injested into the bronze table then uh
there are some uh processes written uh
to uh read data from the bronze layer
and fill the silver layer and similarly
create the uh gold layer that's a
typical project and now I want to write
an automation test case for that
streaming application and that's what
this notebook is uh uh trying to do
right so at high level uh what I want to
do as an automation test is first step I
want to create a job and Trigger that
job and that job will be uh will uh run
the entire workflow right but I don't
want to go and manually create that job
that job is like I can go to the
workflow databas workflow and create a
job manually but I don't want to do that
what I want to do is I want to write
code for creating job triggering the job
and then executing my uh test case and
then performing validation and then
performing cleanup after that for
everything that entire test I want to
write code so that I can automate it so
how do we do that you I want to use rest
API for doing that so let's see what I'm
doing so basically this notebook is uh
taking some uh inputs at the beginning
uh three inputs environment name host
and access token and then taking that
into extracting that into the python
variable you already learned all that
and then I have a set up notebooks
written so I'm importing the setup
notebook and then creating uh instance
of the setup notebook and uh running the
cleanup uh method from the setup module
you will have a good sense of this uh
when you come to the Capstone project
because this is kind of part of the
Capstone project your Capstone project
so but I'm using it for the demo so
cleanup method will clean everything
clean the environment remove everything
and after the cleanup is done I what I
want to do I want to create a workflow
job and Trigger that job uh so for
creating a workflow job I need to use
the uh rest API and this is the rest API
for creating a job so api.
22.1 jobs. create this is the call I
want to make and it's a post call I'm
using python so how do I do it in Python
in Python uh there is a request package
which I will import there is one more
package Json because I'll be passing
argument as a Json response will come as
a Json uh so I need to uh handle some
Json operations so I'm importing Json
package from python this is this is
nothing to do with these spark these are
pure python packages so I import that
and then using the requests I'm making a
post call right request. poost that's
how we make a rest API call in Python so
request. poost and why post because this
is a post method if it is a get method
I'll use request.get so request. poost
and request. poost takes three arguments
first is the URL for the rest API so URL
should be host name my workpace host
name right in which datab workpace I
want to run this
um API right so host name uh which is a
variable I created here in the beginning
host name so I'll take the host name as
an input for this notebook so host name
plus the rest API sorry plus the rest
API rest API URL you already learned
from the documentation this is the URL
so uh SL API 2.1 jobs create so that's
your rest API and then next argument is
uh the input parameter the input Json
right so we know that rest API takes
this kind of Json uh which defines the
job what job I want to create right so
which defines the job so I'm passing the
Json and that Json payload I already
defined here so if you look at the Json
payload this is my job definition right
this is my job definition from there to
here so how it looks what is the name of
the job do I need email notification no
web hooks no timeout no Max concurrent
runs I want one what is the task in the
that job so task name is is as bit
stream and it's a notebook task ask so
notebook can be found at this place so
in the work space so what I want this
job to do is to run this notebook right
and uh which cluster this notebook
should run so it should run on the job
cluster and then I Define job cluster
here and for job cluster uh spark
version should be this and maybe it's a
single node cluster so spark Master
should be this right and all those
definitions are uh defined here for the
cluster and that's how I Define the job
what job I want to create right uh but
how do I this Json so either you know
all the Json syntax from the
documentation everything is defined here
right uh or one easy way is to uh go to
workflow right I'll come to workflow now
I know I want to create a job but I want
to automate that job creation but uh
let's see how we will create the same
job manually so I can go to create job
create the job name here uh let's say
sbit stream test and then one task in
that job I want only one task so I'll um
uh create one task run
sbit notebook I gave the name task name
is notebook type Source notebook can be
found in the working space what is the
path so I'll come to uh notebook I know
where this notebook can be found so I
will come and maybe I want this notebook
to run right so I'll click this task is
defined where I want this uh job to run
on the job cluster yes so let me edit
that job cluster I don't want this big
job cluster I want a single note because
my job is a small single note should
work and uh that's all so this is my uh
definition for the cluster confirm that
right so I manually defined the job
dependent libraries no I don't want any
libraries uh to install what are the
parameters so and uh notebook parameters
so what parameters my notebook will need
so let me um come to working space check
my notebook so let's say my notebook uh
this run notebook what parameters it
takes so it takes three parameters uh
environment name run type and processing
time but all comes with the default
value so let
me give one parameter which is default
as once I want to set it to uh something
else right so in the job create and I
can tell this is the parameter name and
value should be uh continuous it should
not be default one and I defined this
job and that's how we do it using the UI
right
so
let me create this job or uh create this
job I'm not running going to run this
job I just created the definition of the
job and then uh
maybe I can come here and see view Json
and this is my Json for the job
definition so I can copy it and use it
for uh my automation when I'm defining
the job payload uh job definition right
when I'm defining the input so rest this
create job API requires an input Json
you can manually prepare it but nobody
does that manually so what we do we
generate the Json uh copy it and then
use it in our automation script use it
in our code right so now I have uh I'm
not going to replace that that I already
have similar Json here so that's how we
Define the Json so once we have the Json
uh maybe here I come here cancel it come
back to jobs uh let's delete this job so
we have a clean slate here right uh so
that's that's how we are uh defining the
Json and once you have the Json it's
simple uh what I'm doing is uh create uh
uh request. poost passing the URL
passing the Json uh for converting this
payload variable into a valid Json this
looks like Json but it is not a Json
right it uh it is a python dictionary
object so we need to convert that job
payload into a Json so for that I'm
using json. dumps I'm passing the
variable here which will converted it
into a valid Json and then for running
the rest API we also need to provide the
authentication token right so Au token
uh last parameter which I'm taking as a
uh input for this notebook which can be
supplied here right so what we will do
what this code is doing is making a call
to the create job um rest API and once
this is executed we take the response in
the create response and from that create
response using the Json uh method I can
take the job ID right so this is uh
python code uh to take out to pass the
response Json and take out the element
whatever element you want right so job
ID element element I want in return so
I'll take that job ID into the into a
variable and print the job ID and that's
how we use the rest API so you uh and uh
I'm using some more rest apis here like
next is
uh jobs run now rest API if you come to
the documentation uh create job API you
saw how to use it uh maybe
run
job um API where it is list job get
single job trigger a new job so trigger
a new job is a post API uh this is the
URL so once I created the job it will
only create the job in this databas
workflow uh right but we need to trigger
the job so to run that job uh we have
another API run Now API as per the
documentation here right so I'm making
another call to run API and for running
the API I need input uh run API takes uh
job ID at the minimum to run the given
job ID there are a lot of other things
you can uh provide but bare minimum is
the job ID uh it will run once if you
want to schedule it on a regular
interval and all that you have to
provide all the details so this run
payload Json I have built here which
gives only job ID and uh some notebook
params parameters
so I'm passing those notebook you you
saw my run note this job is supposed to
run the Run notebook and run notebook
takes three arguments right so I can
also pass those arguments from here so
environment I'm passing environment run
type I'm passing a streaming processing
time I'm passing one second so these
three arguments I'm passing and then
making a call and once it is executed
I'm taking a response back in the Run
response variable and out of from this
run response I'm taking the Run ID so
this code will give me a run ID I'll
print it why I need a run ID because I
want to monitor I created a job I took
the job ID and then using this job ID
I'm running the job uh here passing the
job ID in the Json input and then took
the Run ID and using that run ID I'm
waiting for the status of the job so
another rest API I'm calling here is
jobs. getet right so if you uh get a
single job run so this API gives me the
status of the current job or whatever
job we want so we I want to monitor that
job uh because triggering a job will
launch a new job cluster and then start
the job so until that cluster is um
created and the job status is changed
from pending to running I want to wait
here and that's why I created a sorry a
while loop here uh and in the while loop
I'll sleep for 10 seconds and then get
the job status and take the response in
the status and from the status I'll take
out task. estate. life cycle estate that
I learned from documentation about the
response here so if you look at
the
uh where it
is okay so this is a response sample
itself so uh once job is created I if in
the get you will get everything in the
response so in the response what I want
to track is tasks State and life cycle
uh life cycle state so uh I'll have
tasks here and in the tasks it's an
array of tasks because job can have
multiple tasks I know my job is having a
single task so I take the first element
of the task and then look at the state
so in the task I have state and inside
the state I have life cycle state which
will be pending in the beginning later
it will be running and finally once uh
if I terminate the job it will be
terminated right so uh I'm taking life
cycle State and then it comes into the
job State and we keep on looping until
it is pending right so that's how we
build the logic so once my job is
started then I want to write my uh uh
test cases so in test cases I want to
load some historical data then I want uh
these are the packages I import uh so
code is here so I want to load some uh
sorry
uh these are uh object Creations so once
the job is started I want to call uh
produce first batch of data then
validate first batch of data uh then
maybe sleep for two minutes so that data
is picked up by the uh bronze layer so
then I validate picked up by the all
three layers right so then I validate
bronze layer then I call the validate
function for silver layer I can call the
validate function for gold gold layer
then I produce second batch of data all
that is uh there and once I'm done with
if all this runs successfully all my
automation uh or my integration test is
passed it is assumed to be passed if it
fails then uh you will get get the error
and once I everything is done then again
I'll use the one more rest API to cancel
the job because I don't it's a test job
right uh once testing is done I want to
cancel that job and then I also want to
delete that job so for that I'm writing
Rest API and at the end I have a success
method so that's how I automated it if I
run it you will see everything happening
so uh let's run it let me close
everything else uh uh for running it I
need to connect it with a running
cluster and
I need to provide these inputs right so
Dev is fine uh datab Bri workpace URL uh
because I want to create job in this
workpace itself but if you want to
create job in your QA workpace you will
provide QA workpace URL as an input but
I want to run it in the same working
space so I can provide the URL for the
same working space and URL is like start
from here up to this place rest all our
URL arguments so you copy this and uh
paste it here maybe slash is not
required at the end and that's your work
space URL this uh code will create a job
so for creating a job you also need to
authenticate for authentication we need
a uh token so we already know where to
find a where to uh do where to create a
token right so you can go to user
settings right and uh UI keep on
changing so they they changed the user
settings page now in the preferences
sorry in the developer menu you will
find access token click manage uh
already two tokens I created so let me
delete these I don't need this so
generate a new token uh you can give
comment uh this is temporary I will
delete it later and lifetime for this
token one day only this token is created
so I copy this token and paste it here
as an argument right so now we are ready
to run right so uh I can run one by one
or I can run all this notebook will run
and you will see everything in action
but let's go one by one so that we can
see what is
happening variables are defined uh we
got the variables in the python this
imported the setup and then I ran and
the cleanup so cleanup module will do
the cleanup uh clean the environment and
prepare it for uh running my integration
test that's part of a typical project
almost every project may have a cleanup
is script if you are willing to do an
automation testing so cleanup is done
then I Define my job
uh payload and then from here I can
start making call to rest API so as soon
as I execute that what it will do uh
this code will create a new job right so
let's open workflow and
see uh we don't have any job here right
so if this works correctly then I should
automatically get a job created here so
let me run this done job created and
this is the job ID right if you come
here you can see SBT stream job is
created it's not running but job is
created you can click here and see it
says run now it's not running come to
task there is a one task definition all
the uh notebook details job cluster
configuration everything is in place so
job is created it's not running but I
have a code to run it also so here is an
example how to run a job using rest API
so let me run this and it should job is
started run ID this if you come here you
can
confirm
uh okay job runs shows here so job runs
it's running and it's in pending State
because it's launching a job cluster and
it takes maybe four five minutes to
launch a cluster so let's run this part
this part we keep on waiting until the
job cluster is created and job is in
running state so we know that status is
pending and that's what we are tracking
here so this will wait for 10 second and
print pending wait for another 10 second
and then check for the status print
pending and we we are keep on waiting
until the job starts because until the
cluster is job cluster is created job is
started there is no point in running our
uh test cases they will definitely fail
they won't find any data so all the
validation will fail so this is how we
can wait for the job to start uh as soon
as it is started we will uh come out of
it so maybe uh all that we can run it
will perform the validation and at the
end I have a script to do the cleanup so
uh that's how we use rest API
so uh hope it made sense and you uh
you're you become familiar with how to
use rest API for automating things in
datab briak U environment and we will
use this technique fully with a proper
example end to end example uh in our
Capstone project uh we are nearing to
the point where we should start talking
about the Capstone databas Capstone
project
so uh you will learn that uh soon what
we are left with two more approaches for
U
um uh automation uh datab brick SDK and
datab brick CLI so we are already uh
consumed a lot of time so maybe it won't
be possible to
cover SD can CLI today maybe I'll cover
it in the next session uh and yeah while
this is waiting for cluster to start we
can take some
questions
[Music]
関連動画をさらに表示
Data Federation with Unity Catalog
What is Databricks? | Introduction to Databricks | Edureka
Automate any task using Claude! (my full Claude projects workflow)
Intro to Supported Workloads on the Databricks Lakehouse Platform
Part 1- End to End Azure Data Engineering Project | Project Overview
Automate PDF Invoices Data Transfer to Google Sheets with ChatGPT & Zapier | Tutorial
5.0 / 5 (0 votes)