Wednesday, October 16, 2024

Convert date in Databricks - JD Edwards Julian date to calendar date

Convert date in Databricks - JD Edwards Julian date to calendar date


CREATE FUNCTION yourdatabase.date_convert_j_to_c(gldgj INT)
RETURNS TIMESTAMP
RETURN date_sub(
    date_add(
        DAY,
        cast(gldgj AS VARCHAR(10)) % 1000,
        date_add(YEAR, cast(gldgj AS VARCHAR(10)) / 1000, cast('1900-01-01' AS timestamp))
    ),
    1
);

Friday, June 21, 2024

Monday, June 10, 2024

Load data from SQL Server to databricks using Excel VBA

Sub Upload_Data_To_SQLServer_Dynamic() ' Define variables Dim Cn As ADODB.Connection Dim rs As ADODB.Recordset Dim uploadQuery As String Dim batchCounter As Long Dim batchSize As Long Dim batchQuery As String Dim headerArray() As String Dim numColumns As Integer Dim columnList As String Dim sqlQuery As String Dim TableName As String Dim i As Integer Dim StartTime As Double StartTime = Timer ' Establish connection to SQL Server using Windows Authentication Set Cn = New ADODB.Connection Cn.ConnectionString = "Provider=SQLOLEDB;Data Source=YourServerName ;Initial Catalog=mfdb;Integrated Security=SSPI;" Cn.Open ' SQL query to get data from source table (modify as needed) sqlQuery = "SELECT * FROM dbo.tblco_list" ' Execute the query and get a Recordset Set rs = New ADODB.Recordset rs.Open sqlQuery, Cn, adOpenStatic, adLockReadOnly ' Initialize variables batchCounter = 0 batchSize = 80 TableName = "DatabricksDatabasename.tblCoListCon_Temp" uploadQuery = "INSERT INTO " & TableName & " (" ' Read the header row to determine the number of columns numColumns = rs.Fields.Count ReDim headerArray(numColumns - 1) For i = 0 To numColumns - 1 headerArray(i) = rs.Fields(i).Name columnList = columnList & rs.Fields(i).Name If i < numColumns - 1 Then columnList = columnList & ", " End If Next i uploadQuery = uploadQuery & columnList & ") " ' Initialize the batch query batchQuery = "" ' Establish connection to SQL Server (replace details with your own) dsnName = "_DL_PRD_DSN_DBLT" Set Cn2 = New ADODB.Connection Cn2.Open "DSN=" & dsnName & ";Uid=databricksuserid;Pwd=databricksPassword;" ' Read and process the data from the recordset Do While Not rs.EOF ' Build the SELECT part dynamically based on data types (modify as needed) batchQuery = batchQuery & "SELECT " For i = 0 To numColumns - 1 If IsNull(rs.Fields(i).Value) Then batchQuery = batchQuery & "NULL" Else batchQuery = batchQuery & "'" & Replace(rs.Fields(i).Value, "'", "''") & "'" End If If i < numColumns - 1 Then batchQuery = batchQuery & ", " End If Next i batchQuery = batchQuery & " UNION ALL " & Chr(10) batchCounter = batchCounter + 1 ' Check if batch size is reached If batchCounter >= batchSize Then ' Remove the last 'UNION ALL' and execute the batch query If Len(batchQuery) > 0 Then batchQuery = Left(batchQuery, Len(batchQuery) - Len(" UNION ALL " & Chr(10))) Debug.Print uploadQuery & Chr(10) & batchQuery Cn2.Execute uploadQuery & Chr(10) & batchQuery End If ' Reset the batch variables batchQuery = "" batchCounter = 0 End If rs.MoveNext Loop ' Execute any remaining batch query If batchCounter > 0 And Len(batchQuery) > 0 Then batchQuery = Left(batchQuery, Len(batchQuery) - Len(" UNION ALL " & Chr(10))) Cn2.Execute uploadQuery & Chr(10) & batchQuery End If ' Close the recordset rs.Close ' Close the connection Cn2.Close ' Inform user of successful upload MsgBox Format((Timer - StartTime) / 86400, "hh:mm:ss") & " Data upload to SQL Server completed!", vbInformation ' Clean up Set rs = Nothing Set Cn = Nothing End Sub

Thursday, June 6, 2024

How to load data (CSV file) in databricks delta table using Excel, VBA and DSN

 How to load data (CSV file) in databricks delta table using Excel, VBA and DSN

1. Create DSN using Simba driver on your machine

2. Get token and password from databricks

3. CSV column header Name must be same as table's columns header. 


Sub UploadDataToSQLServer_Dynamic()


    ' Define variables

    Dim sFile As String

    Dim Cn As ADODB.Connection

    Dim UploadQuery As String

    Dim fso As Object

    Dim ts As Object

    Dim line As String

    Dim DataArray() As String

    Dim i As Long

    Dim batchCounter As Long

    Dim batchSize As Long

    Dim batchQuery As String

    Dim headerArray() As String

    Dim numColumns As Integer

    Dim columnList As String

    

    Dim StartTime As Double

     StartTime = Timer


    ' Get the CSV file to upload

    sFile = Application.GetOpenFilename("CSV Files (*.csv), *.csv", , "Select CSV File to Upload")

    If sFile = "False" Then Exit Sub ' User canceled the file selection


    ' Open the CSV file

    Set fso = CreateObject("Scripting.FileSystemObject")

    Set ts = fso.OpenTextFile(sFile, 1) ' 1 for reading


    ' Establish connection to SQL Server (replace details with your own)

    dsnName = "YourDSNName"

    Set Cn = New ADODB.Connection

    Cn.Open "DSN=" & dsnName & ";Uid=YorrTokenFromDataBricks;Pwd=YourPassWord from Databricks ;"


    ' Initialize variables

    batchCounter = 0

    batchSize = 80

    TableName = "DatabaseName.TableName"

    

    UploadQuery = "INSERT INTO " & TableName & "("


    ' Read the header row to determine the number of columns

    If Not ts.AtEndOfStream Then

        line = ts.ReadLine

        headerArray = Split(line, ",")

        numColumns = UBound(headerArray) + 1

        columnList = Join(headerArray, ", ")

        UploadQuery = UploadQuery & columnList & ") "

    End If


    ' Initialize the batch query

    batchQuery = ""


    ' Read and process the CSV file

    i = 0

    Do Until ts.AtEndOfStream

        line = ts.ReadLine

        DataArray = Split(line, ",")


        ' Build the SELECT part dynamically based on data types (modify as needed)

        If UBound(DataArray) = UBound(headerArray) Then

            batchQuery = batchQuery & "SELECT "

            For j = 0 To numColumns - 1

                batchQuery = batchQuery & "'" & Replace(DataArray(j), "'", "''") & "'"

                If j < numColumns - 1 Then

                    batchQuery = batchQuery & ", "

                End If

            Next j

            batchQuery = batchQuery & " UNION ALL " & Chr(10)

            batchCounter = batchCounter + 1


            ' Check if batch size is reached

            If batchCounter >= batchSize Then

                ' Remove the last 'UNION ALL' and execute the batch query

                If Len(batchQuery) > 0 Then

                    batchQuery = Left(batchQuery, Len(batchQuery) - Len(" UNION ALL " & Chr(10)))

                 '   Debug.Print UploadQuery & Chr(10) & batchQuery

                    

                    Cn.Execute UploadQuery & Chr(10) & batchQuery

                End If

                ' Reset the batch variables

                batchQuery = ""

                batchCounter = 0

            End If

        End If

        i = i + 1

    Loop


    ' Execute any remaining batch query

    If batchCounter > 0 And Len(batchQuery) > 0 Then

        batchQuery = Left(batchQuery, Len(batchQuery) - Len(" UNION ALL " & Chr(10)))

        Debug.Print UploadQuery & Chr(10) & batchQuery


        Cn.Execute UploadQuery & Chr(10) & batchQuery

    End If


    ' Close the text stream

    ts.Close


    ' Close the connection

    Cn.Close


    ' Inform user of successful upload

    MsgBox Format((Timer - StartTime) / 86400, "hh:mm:ss") & " Data upload to Databricks SQL  completed!", vbInformation


    ' Clean up

    Set Cn = Nothing

    Set fso = Nothing

    Set ts = Nothing


End Sub



Thursday, February 15, 2024

Databricks - How to create function UDF

A user-defined function (UDF) is a means for a user to extend the native capabilities of Apache Spark™ SQL. SQL on Databricks has supported external user-defined functions written in Scala, Java, Python and R programming languages since 1.3.0. While external UDFs are very powerful, they also come with a few caveats:

  • Security. A UDF written in an external language can execute dangerous or even malicious code. This requires tight control over who can create UDF.
  • Performance. UDFs are black boxes to the Catalyst Optimizer. Given Catalyst is not aware of the inner workings of a UDF, it cannot do any work to improve the performance of the UDF within the context of a SQL query.
  • SQL Usability. For a SQL user it can be cumbersome to write UDFs in a host language and then register them in Spark. Also, there is a set of extensions many users may want to make to SQL which are rather simple where developing an external UDF is overkill.

 https://www.databricks.com/blog/2021/10/20/introducing-sql-user-defined-functions.html

Thursday, January 18, 2024

Openai with databricks sql for queries in natural language

Modern data platforms store and collect an incredible amount of both useful data and metadata. However, even knowing the metadata itself might be not useful for the end-users who don’t have enough experience with classical components of a relation-based data model. One of the challenges is not only the ability to write proper SQL statements to select the relevant information but also understanding of what needs to be joined (and how exactly this shall be done) even to get the simplest insights (e.g. top-5 customers from a given region by the number of orders).

 https://polarpersonal.medium.com/using-openai-with-databricks-sql-for-queries-in-natural-language-cf6521e88148

Wednesday, October 18, 2023

Copy CSV file to Databricks Delta table

COPY INTO delta.`abfss://container@storageAccount.dfs.core.windows.net/deltaTables/target`
  FROM (SELECT key, index, textData, 'constant_value'
          FROM 'abfss://container@storageAccount.dfs.core.windows.net/base/path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

-- The example below loads CSV files without headers on ADLS Gen2 using COPY INTO.
-- By casting the data and renaming the columns, you can put the data in the schema you want
COPY INTO delta.`abfss://container@storageAccount.dfs.core.windows.net/deltaTables/target`
  FROM (SELECT _c0::bigint key, _c1::int index, _c2 textData
        FROM 'abfss://container@storageAccount.dfs.core.windows.net/base/path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'


 https://docs.databricks.com/en/ingestion/copy-into/examples.html


Thursday, August 17, 2023

Databricks Rename Table Column Name

 ALTER TABLE enterprise_db.tblmf_sources SET TBLPROPERTIES (

  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)
Alter table RENAME enterprise_db.tblmf_sources COLUMN     sourceName TO SourceName

Friday, May 26, 2023

Remove and add widgets in databricks dbutils.widgets



Remove:- dbutils.widgets.removeAll()

Add:- dbutils.widgets.text("EnterCaseNo", "17102603374")

Thursday, May 11, 2023

Writing letter using openAI python

This code generated by chat.openai.com and this code is for databricks notebook. I modified according to my need.


 %python

import openai
openai.api_key = os.environ["OPENAI_API_KEY"]

#prompt = "Write a news article about a new breakthrough in cancer research."
#prompt = "Write a letter about cancer research. to a doctor "
#prompt = "Write a letter to supervisor about leave"
prompt =dbutils.widgets.get("SearchBox")

response = openai.Completion.create(
    engine="text-davinci-002",
    prompt=prompt,
    max_tokens=1024,
    n=1,
    stop=None,
    temperature=0.5,
)

article = response.choices[0].text
#print(article)

df = spark.createDataFrame([(article,)], ["article"]  )
# Display the DataFrame as a table
display(df)

Tuesday, April 18, 2023

How to see Stored Proc code if you don't have access SQL Server

 EXEC [SERVER NAME].[DATABASE NAME].DBO.SP_HELPTEXT 'Stored Proc Name'


Thursday, April 13, 2023

Databricks SQL Useful Shortcut

 

 Databricks SQL Useful Shortcut

Ctrl+Shift+L : List tables in the current database.

Ctrl+Shift+T : Create a new table.

Ctrl+Shift+D : Drop a table.

Ctrl+Shift+U : Update a table.

Ctrl+Shift+I : Insert a row into a table.

Ctrl+Shift+D : Delete a row from a table.

Ctrl+Shift+F : Filter data in a table.

Ctrl+Shift+S : Sort data in a table.

Ctrl+Shift+P : Perform a SQL query.

Databricks Notebook Useful Shortcut

Databricks notebook shortcut

Ctrl+Enter : Run the current cell.

Shift+Enter : Run the current cell and move to the next cell.

Ctrl+Alt+p : Create a cell above the current cell.

Ctrl+Alt+n : Create a cell below the current cell.

Ctrl+Z : Undo the last action.

Ctrl+Y : Redo the last action.

Ctrl+A : Select all text in the current cell.

Ctrl+C : Copy the selected text.

Ctrl+V : Paste the copied text.

Ctrl+X : Cut the selected text.

Ctrl+L : Show or hide line numbers.

Ctrl+K : Comment out the selected text.

Ctrl+U : Uncomment the selected text.

Ctrl+Shift+F : Format the code in the current cell.

Ctrl+Shift+B : Run all cells in the current notebook.

Ctrl+Shift+S : Save the current notebook.

Ctrl+Shift+G : Go to line number.

Ctrl+Shift+H : Find and replace text.

Ctrl+Shift+P : Open the command palette.

Ctrl+Shift+Q : Quit Databricks.

Alt + Enter: Run the current cell and insert a new one below it

Esc + 0: Restart the kernel

Esc + M: Convert the current cell to a Markdown cell

Esc + Y: Convert the current cell to a code cell

Esc + A: Insert a new cell above the current cell

Esc + B: Insert a new cell below the current cell

Esc + X: Cut the current cell

Esc + C: Copy the current cell

Esc + V: Paste the cell that was cut or copied

Esc + D, D: Delete the current cell


Wednesday, March 22, 2023

ISNULL in Databricks SQL

Use 

 coalesce

coalesce(member_count,0)

Wednesday, March 8, 2023

Command line executable sql server ChatGPT

 A command line executable for SQL Server is a file that can run SQL queries or scripts using a command-line utility such as sqlcmd123. It usually has an extension like .sql or .bat. To run a command line executable for SQL Server, you need to know its file path and use the appropriate syntax for your utility. For example:


In sqlcmd, you can use the -i option to specify the input file name. For example: sqlcmd -S server_name -d database_name -i C:\Scripts\script.sql

In a batch file, you can use the sqlcmd command with the input file name as an argument. For example: sqlcmd -S server_name -d database_name -i C:\Scripts\script.sql

You can also pass other options to the sqlcmd utility, such as -o to specify the output file name, -v to define variables, or -q to execute a query directly. For more information on sqlcmd options, see here.


If you want to create your own command line executable for SQL Server, you can