Archive for February, 2014

I explain SQL to a beginner

Friday, February 28th, 2014

I recently gave this explanation of SQL to a beginner, and I thought it was good enough to save for posterity. This is from a Skype Chat.

[2/28/14 3:56:05 PM] Sheer Pullen: The real thing to understand is that SQL is just a very formal way of asking questions
[2/28/14 3:56:42 PM] Sheer Pullen: and putting information into a system to enable you to ask it questions
[2/28/14 3:57:08 PM] Sheer Pullen: The basic form of information storage in such a system is the table, which is basically just like a excel worksheet
Tables have rows and columns – a column is a datum about something, generally, where a row is a instance of that something.
So if you had a list of customers, each customer would be a row and details about them, like their name or number would be a column
In order to make joining easier, a convention has grown up to use “keys”, which are unique numbers or strings that help identify a row
[2/28/14 3:59:20 PM] Sheer Pullen: in most systems these are just a number, called a identity, that automatically increments.. grows by a set number each row you add – generally 1
[2/28/14 3:59:33 PM] Sheer Pullen: foreign keys point to values in other tables or even other databases
[2/28/14 3:59:51 PM] Sheer Pullen: so if you had a table that was orders, and a table that was customers, orders might have a foreign key to point to he customer
[2/28/14 4:00:00 PM] Sheer Pullen: because it’s “foreign” to the orders table..
[2/28/14 4:00:36 PM] Sheer Pullen: SQL has a few very basic verbs that you use for almost all operations
[2/28/14 4:00:44 PM] Sheer Pullen: SELECT, for looking up a value – asking a question
[2/28/14 4:00:47 PM] Sheer Pullen: INSERT, for adding a row
[2/28/14 4:00:50 PM] Sheer Pullen: DELETE, for removing a row
[2/28/14 4:00:54 PM] Sheer Pullen: and UPDATE for changing a row
there are also some verbs you use only when you’re setting up a table, like CREATE TABLE and CREATE INDEX
[2/28/14 4:01:13 PM] (name deleted): insert, delete and update sound scary to a newbie
[2/28/14 4:01:20 PM] Sheer Pullen: yes, as a newcomer you start with SELECT
[2/28/14 4:01:29 PM] Sheer Pullen: especially if you’re working on a production database 😉
[2/28/14 4:02:01 PM] Sheer Pullen: and as a newcomer, you don’t work directly on prod databases at places like (medium-sized client), you work on what is called a read replica, which is a clone
[2/28/14 4:02:05 PM] Sheer Pullen: that way you can’t actually damage anything
[2/28/14 4:02:15 PM] Sheer Pullen: even if you ask a question that takes more resources than the server has to answer
[2/28/14 4:02:33 PM] Sheer Pullen: (it is possible to hurt (our production main server) with a poorly chosen SELECT just by asking it to use more resources than it has)
[2/28/14 4:02:54 PM] (name deleted): server down
[2/28/14 4:02:56 PM] (name deleted): crash
[2/28/14 4:02:57 PM] (name deleted): boom
[2/28/14 4:03:00 PM] Sheer Pullen: well, no, you won’t crash the server
[2/28/14 4:03:06 PM] Sheer Pullen: but you will make it very slow for all the other people using it
[2/28/14 4:03:17 PM] (name deleted): what is a deadlock
[2/28/14 4:03:27 PM] (name deleted)t: to many wanting the same thing from same place
[2/28/14 4:03:29 PM] Sheer Pullen: Ah. This actually gets into a non-SQL concept
[2/28/14 4:03:33 PM] Sheer Pullen: but yes, that’s a very good summary
with modern computers, you can be doing more than one thing at once because there’s more than one CPU
[2/28/14 4:04:02 PM] Sheer Pullen: if two CPUs both want the same disk resource at the same time, they can’t have it, because there’s only one disk
[2/28/14 4:04:09 PM] Sheer Pullen: so one has to wait for the other
[2/28/14 4:04:13 PM] Sheer Pullen: if they ask at *exactly* the same time
[2/28/14 4:04:23 PM] Sheer Pullen: then the database engine doesn’t know which of them it should serve, so it tells them both they deadlocked.
[2/28/14 4:04:27 PM] Sheer Pullen: and then they have to try again
[2/28/14 4:04:47 PM] (name deleted): so what manages that “request”
[2/28/14 4:05:03 PM] Sheer Pullen: modern database engines have a concept called ACID, which means Atomic, Consistant, Isolated, and Durable
[2/28/14 4:05:11 PM] Sheer Pullen: They will not let the code ask for something which would break it
[2/28/14 4:05:24 PM] Sheer Pullen: the database engine is what handles requests in SQL
[2/28/14 4:05:28 PM] (name deleted): are we a modern DB engine?
[2/28/14 4:05:30 PM] Sheer Pullen: We are.
[2/28/14 4:05:42 PM] Sheer Pullen: We use mysql 5.5, which is a database engine with 20 years of open source work behind it’s code
[2/28/14 4:05:47 PM] Sheer Pullen: I’ve worked on it a time or two myself
[2/28/14 4:05:51 PM] Sheer Pullen: thousands of people have
[2/28/14 4:06:15 PM] (name deleted): is there any book you recommend ie sql for dummies?
[2/28/14 4:06:25 PM] Sheer Pullen: That w3schools tutorial is a great place to start
[2/28/14 4:06:32 PM] Sheer Pullen: it has a SQL engine written in java written right into the page
[2/28/14 4:06:37 PM] Sheer Pullen: so you can try things out on the web page
[2/28/14 4:06:48 PM] (name deleted): and what is the best doc to have for (clients) schema
[2/28/14 4:06:56 PM] Sheer Pullen: *laughs* There isn’t really one.
[2/28/14 4:06:58 PM] (name deleted): saw something called ERD?
[2/28/14 4:07:02 PM] (name deleted): what is ERd?
[2/28/14 4:07:12 PM] Sheer Pullen: (our DBA) is building ERDs, which are graphical representations of the relationships between tables. ERD stands for Entity Relationship Diagram
[2/28/14 4:07:22 PM] Sheer Pullen: and you can get a copy of those and look at them and they will help you
[2/28/14 4:07:42 PM] Sheer Pullen: the “stock” ofbiz tables are documented in several Ofbiz books, I can send you links if you like
[2/28/14 4:07:56 PM] Sheer Pullen: but we have added many tables and relationships only documented in (our DBA’s) diagrams, or in some places (sadly) nowhere.
[2/28/14 4:08:18 PM] Sheer Pullen: However, a lot of those are for pretty exotic concepts
as a beginner, you’re going to start out asking questions like “What was the volume for this order, and who placed it”
[2/28/14 4:08:50 PM] Sheer Pullen: Actually, your very first questions will be things like “What is the name for this party ID”
[2/28/14 4:08:54 PM] (name deleted): how many active (customers) by state
[2/28/14 4:09:42 PM] Sheer Pullen: Now, if you are curious, here is how I would ask that question
[2/28/14 4:10:32 PM] Sheer Pullen: First, I’d think about what tables I needed
[2/28/14 4:10:41 PM] Sheer Pullen: I’d know I needed POSTAL_ADDRESS, because duh that’s where addresses live
[2/28/14 4:10:49 PM] Sheer Pullen: I’d know I needed (custom customer table), because that’s how I find active consultants
[2/28/14 4:11:01 PM] Sheer Pullen: I’d know i needed PARTY_CONTACT_MECH_PURPOSE, because that ties the two together
[2/28/14 4:11:13 PM] Sheer Pullen: then I’d think about HOW I needed to join the tables together
[2/28/14 4:11:44 PM] Sheer Pullen: I’d know POSTAL_ADDRESS and PARTY_CONTACT_MECH_PURPOSE needed to join on CONTACT_MECH_ID, and additionally that I wanted entries that had a CONTACT_MECH_PURPOSE_TYPE_ID of (a enum), because that’s where we store their most important address
[2/28/14 4:12:34 PM] Sheer Pullen: I’d know (custom customer table) and PARTY_CONTACT_MECH_PURPOSE needed to join on PARTY_ID, because that’s the key that describes a unique person like a consultant, and that in addition I only wanted rows from (custom customer table) that had a APPLICATION_STATUS_ID != ‘(Customers who are still active enum)’ because I wanted ACTIVE (customers).
[2/28/14 4:13:09 PM] Sheer Pullen: In addition, I’d know I wanted to group the results, because I’m looking for a particular thing, their state
[2/28/14 4:13:15 PM] (name deleted): so first tables
[2/28/14 4:13:23 PM] (name deleted): that the where
[2/28/14 4:13:29 PM] (name deleted): then the what
[2/28/14 4:13:30 PM] Sheer Pullen: *grins* I’m just describing the way I think through these things. You might choose a totally different way.
[2/28/14 4:13:34 PM] Sheer Pullen: but yes, that’s exactly right
[2/28/14 4:13:37 PM] Sheer Pullen: then the last one is the how
[2/28/14 4:13:48 PM] Sheer Pullen: in this particular case, I want them grouped by the state they’re in, and I want a count of that state
As far as what order to join in, the true wizard joins for performance, starting with most specific first, when writing something that’s going to take a lot of database Parenthesis. For a question like that one that’s pretty easy, I generally just write it out in whatever order it comes to me.
[2/28/14 4:35:46 PM] Sheer Pullen: Two more little quick things I want to mention and then I’ll get back to “real work”
[2/28/14 4:35:48 PM] Sheer Pullen: 1) AND and OR
[2/28/14 4:36:02 PM] Sheer Pullen: these are very important concepts in every type of computer operation you will ever do
[2/28/14 4:36:13 PM] Sheer Pullen: AND means both sides are true, OR means one or the other (or both) sides are true
[2/28/14 4:36:21 PM] Sheer Pullen: There’s a third, XOR, which means ONLY one or the other side is true..
[2/28/14 4:36:29 PM] Sheer Pullen: but that’s not very often needed for stuff you’ll do
[2/28/14 4:36:33 PM] Sheer Pullen: and 2) Parenthesis
[2/28/14 4:36:53 PM] Sheer Pullen: As you probably learned in high school math, there is a order of operations which every equation.. which a SQL query is a type of.. will be evaluated on
[2/28/14 4:37:07 PM] Sheer Pullen: The way to bypass this is by placing terms in parenthesis
[2/28/14 4:37:24 PM] Sheer Pullen: if you are getting results you didn’t expect, you might add parens to expressly state the order of operations you want
[2/28/14 4:37:29 PM] Sheer Pullen: this is just like algebra..
[2/28/14 4:38:02 PM] Sheer Pullen: i.e. in algebra the OO is PEMDAS – where the P is parens, forcing anything in parenthesis to be evaluated first
[2/28/14 4:38:19 PM] Sheer Pullen: and otherwise it would be exponents, multiplication, division, addition, subtraction
[2/28/14 4:38:28 PM] Sheer Pullen: but if you want subtraction to be evaluated first you throw it in parens
[2/28/14 4:38:39 PM] Sheer Pullen: SQL is the exact same way, as is every high level programming language I know of