Python and SQL

Thanks to a Black Friday sale, I signed up for a Udemy Machine Learning class. The first few chapters are a Python review. I have used Python on and off but I do want to use it more so I can move from an advanced beginner status. I took the classes and decided to use my blog to expand on the work they had us do in the class.

The chapter on Pandas really focuses on data frames. This blog will take a side by side look at the Pandas data frame commands vs SQL commands. The code below creates and populates a Pandas data frame. This code comes explicitly from the class.

Now there are a couple of ways I could create this in SQL. I could use the SQLAlchemy package to upload the frame into my database. I could save the data frame to a .csv and then upload that into my database. Or I could manually create the table myself.

In the Python code, the index doesn’t have an explicit name: it’s just index. An index in SQL is a different type of structure, so to simplify this explanation, I made it a column and called it “Letter”.

Selecting a column is easy in Python. You just call the data frame and the column.

SQL uses the traditional select statement.

Do you want to get more than 1 column? No problem. Just call both columns and put them in double brackets.

SQL just needs a , (column name) in the select statement and you can have both columns too!

You can easily create a new column in Python by adding 2 columns and giving the new column a name. In the class example, the teacher called it “new”

SQL isn’t quite as nimble. You have to alter the table to add a new column and then update the new column.

Now if you want to drop that same column, in Python you have to use the inplace=True option to reassure Python that yes, you really do want to eliminate that column forever and ever.

In SQL, you tell the database that you are editing a table and then tell it exactly what you want to edit. In our case, we want to eliminate that pesky new column.

So what if you want to select an entire row of data?

In Python you can tell it which row by using the index. Remember our index is letters. In this case we want everything from the A row. Look at that…it’s a row of data but Python displays it as a column. (PIVOT right?)

And here’s our trusty Select statement with a where clause to get an entire row of A data.

What’s that? You want 2 rows and 2 columns of data? No problem with either Python or SQL.

Simple selects and where clauses are pretty easy right? What about something more advanced? What if we want everything in the table greater than 0?

SQL isn’t quite as robust. We have to select columns here and the results don’t get converted to boolean.

Let’s get a little more complex. Let’s ask for everything in Y and X that is on the same row in W that is over 0

The SQL statement here is pretty easy:

What if we want data where the W column is greater than 0 and the Y column is greater than 1? You can use parenthesis and an ampersand to select that from the data frame.

In SQL, you can put those criteria into your where clause (that’s what it’s for.)

SQL Server Analysis Services Sequence Clustering Part 2

In the first post of this series, I demonstrated how to model the data for this project. This post walks though using the SQL Data Tools to set up the model. Open the SQL Data Tools and select Analysis Services Multidimensonal and Data Mining Project.  Name your project (I named mine SequenceClustering) and enter a location to save the files.  Click Ok

The SQL Data Tools now opens the workspace that you will use to create your model.  Right click on Data Sources to use the wizard to tell the tool where your data is stored. 

Now you are going to tell SQL Data Tools how your data is structured. Right click on Data Source Views.  This will open a wizard that will ask you to confirm your data source and then ask you to select the objects from that data source that you want to use.

For this example, there are 2 tables:

Right click on Mining Structures and click Create New Mining Structures.  This opens a wizard to walk you through creating the structure.

Tell the wizard to create the structure from an existing relational database or data warehouse and click next.

Select Microsoft Sequence Clustering from the data mining technique drop down and click Next

Now we set up the Case and Nested table roles.  VisitDetails is the nested table and WebSiteVisit is the case table.  Once you click the right boxes, click next.

Now the wizard will ask you what you want to be able to predict from this model and what you want to use as input to create those predictions. 

In the next step, the model is confirming the data types

Click next

At this end of this process, you should have something that looks like this:

Now that you’ve created the data mining objects, it’s time to let your model read the underlying data and make sense of it.  Right click on the model and click process.

After you click Run, the model will start reading data and creating sequence clusters.

This processing provides data to the model so it can determine the appropriate clusters for the sequences. After it is done, you will be able to browse and query the data. I will walk you through that in the next post.

SQL Server Analysis Services: Sequence Clustering

Why didn’t SSAS’s data mining tool receive more attention or more buzz in the tech world? There doesn’t seem to be a lot of documentation or tutorials beyond Microsoft’s AdventureWorks models. But it can be a very simple solution for predictions or analytic uses. Here’s my very simple explanation for setting up a sequence cluster model to predict URL paths. There will be 3 parts and today’s post is starting at the very beginning explaining how to model your data. You can have a sequence cluster up and running within an hour.

Correctly modelling your data is the hardest step of the whole process.  However, once you understand the requirements and set up your tables correctly, the rest of the process is easy. The sequence cluster model requires two tables.  The first is a case table which holds a key to identify each visit and other non-sequenced attributes, such as a customer identifier, visit date and customer type.  Here’s an example of what is in this table .

VisitId CustomerId VisitDate CustomerType
1 134123412 2-5-19 P (for Purchaser)
2 2352323 2-5-19 F (for Free Registered Users)
3 23523452345 2-5-19 N (for non-Registered Users)
4 323452345 2-5-19 P

The nested table holds the sequence data for each visit. Our data looks something like this:

VisitIdSequenceIdURL
11/level1/level2/page.aspx
12/level1/level2/page2.aspx
13 /level1/level2/page3.aspx
21/level1/level2/page4.aspx
22/level2/level2/page2.aspx

Tomorrow, I will walk you through using SQL Data Tools to set up your model.


Engage in Meaningful, Creative Work

Several years ago I read an article about the things that lead us to greater health and happiness.  Besides telling readers to volunteer and stop watching so much TV or spending so much time with social media, the article had a topic titled “Engage in Meaningful, Creative Work”.  It said to do things that challenge creativity and make us feel productive.

I haven’t always felt that the work I did is meaningful or creative.  This blog is a now going to serve as my creative outlet and if it helps someone understand SQL Server better, then that’s a plus too!