{"id":2356,"date":"2014-02-28T17:54:16","date_gmt":"2014-03-01T00:54:16","guid":{"rendered":"http:\/\/www.sheer.us\/weblogs\/?p=2356"},"modified":"2014-02-28T17:54:16","modified_gmt":"2014-03-01T00:54:16","slug":"i-explain-sql-to-a-beginner","status":"publish","type":"post","link":"https:\/\/www.sheer.us\/weblogs\/uncategorized\/i-explain-sql-to-a-beginner","title":{"rendered":"I explain SQL to a beginner"},"content":{"rendered":"<p>I recently gave this explanation of SQL to a beginner, and I thought it was good enough to save for posterity. This is from a Skype Chat.<\/p>\n<p>[2\/28\/14 3:56:05 PM] Sheer Pullen: The real thing to understand is that SQL is just a very formal way of asking questions<br \/>\n[2\/28\/14 3:56:42 PM] Sheer Pullen: and putting information into a system to enable you to ask it questions<br \/>\n[2\/28\/14 3:57:08 PM] Sheer Pullen: The basic form of information storage in such a system is the table, which is basically just like a excel worksheet<br \/>\nTables have rows and columns &#8211; a column is a datum about something, generally, where a row is a instance of that something.<br \/>\nSo if you had a list of customers, each customer would be a row and details about them, like their name or number would be a column<br \/>\nIn order to make joining easier, a convention has grown up to use &#8220;keys&#8221;, which are unique numbers or strings that help identify a row<br \/>\n[2\/28\/14 3:59:20 PM] Sheer Pullen: in most systems these are just a number, called a identity, that automatically increments.. grows by a set number each row you add &#8211; generally 1<br \/>\n[2\/28\/14 3:59:33 PM] Sheer Pullen: foreign keys point to values in other tables or even other databases<br \/>\n[2\/28\/14 3:59:51 PM] Sheer Pullen: so if you had a table that was orders, and a table that was customers, orders might have a foreign key to point to he customer<br \/>\n[2\/28\/14 4:00:00 PM] Sheer Pullen: because it&#8217;s &#8220;foreign&#8221; to the orders table..<br \/>\n[2\/28\/14 4:00:36 PM] Sheer Pullen: SQL has a few very basic verbs that you use for almost all operations<br \/>\n[2\/28\/14 4:00:44 PM] Sheer Pullen: SELECT, for looking up a value &#8211; asking a question<br \/>\n[2\/28\/14 4:00:47 PM] Sheer Pullen: INSERT, for adding a row<br \/>\n[2\/28\/14 4:00:50 PM] Sheer Pullen: DELETE, for removing a row<br \/>\n[2\/28\/14 4:00:54 PM] Sheer Pullen: and UPDATE for changing a row<br \/>\nthere are also some verbs you use only when you&#8217;re setting up a table, like CREATE TABLE and CREATE INDEX<br \/>\n[2\/28\/14 4:01:13 PM] (name deleted): insert, delete and update sound scary to a newbie<br \/>\n[2\/28\/14 4:01:20 PM] Sheer Pullen: yes, as a newcomer you start with SELECT<br \/>\n[2\/28\/14 4:01:29 PM] Sheer Pullen: especially if you&#8217;re working on a production database \ud83d\ude09<br \/>\n[2\/28\/14 4:02:01 PM] Sheer Pullen: and as a newcomer, you don&#8217;t work directly on prod databases at places like (medium-sized client), you work on what is called a read replica, which is a clone<br \/>\n[2\/28\/14 4:02:05 PM] Sheer Pullen: that way you can&#8217;t actually damage anything<br \/>\n[2\/28\/14 4:02:15 PM] Sheer Pullen: even if you ask a question that takes more resources than the server has to answer<br \/>\n[2\/28\/14 4:02:33 PM] Sheer Pullen: (it is possible to hurt (our production main server) with a poorly chosen SELECT just by asking it to use more resources than it has)<br \/>\n[2\/28\/14 4:02:54 PM] (name deleted): server down<br \/>\n[2\/28\/14 4:02:56 PM] (name deleted): crash<br \/>\n[2\/28\/14 4:02:57 PM] (name deleted): boom<br \/>\n[2\/28\/14 4:03:00 PM] Sheer Pullen: well, no, you won&#8217;t crash the server<br \/>\n[2\/28\/14 4:03:06 PM] Sheer Pullen: but you will make it very slow for all the other people using it<br \/>\n[2\/28\/14 4:03:17 PM] (name deleted): what is a deadlock<br \/>\n[2\/28\/14 4:03:27 PM] (name deleted)t: to many wanting the same thing from same place<br \/>\n[2\/28\/14 4:03:29 PM] Sheer Pullen: Ah. This actually gets into a non-SQL concept<br \/>\n[2\/28\/14 4:03:33 PM] Sheer Pullen: but yes, that&#8217;s a very good summary<br \/>\nwith modern computers, you can be doing more than one thing at once because there&#8217;s more than one CPU<br \/>\n[2\/28\/14 4:04:02 PM] Sheer Pullen: if two CPUs both want the same disk resource at the same time, they can&#8217;t have it, because there&#8217;s only one disk<br \/>\n[2\/28\/14 4:04:09 PM] Sheer Pullen: so one has to wait for the other<br \/>\n[2\/28\/14 4:04:13 PM] Sheer Pullen: if they ask at *exactly* the same time<br \/>\n[2\/28\/14 4:04:23 PM] Sheer Pullen: then the database engine doesn&#8217;t know which of them it should serve, so it tells them both they deadlocked.<br \/>\n[2\/28\/14 4:04:27 PM] Sheer Pullen: and then they have to try again<br \/>\n[2\/28\/14 4:04:47 PM] (name deleted): so what manages that &#8220;request&#8221;<br \/>\n[2\/28\/14 4:05:03 PM] Sheer Pullen: modern database engines have a concept called ACID, which means Atomic, Consistant, Isolated, and Durable<br \/>\n[2\/28\/14 4:05:11 PM] Sheer Pullen: They will not let the code ask for something which would break it<br \/>\n[2\/28\/14 4:05:24 PM] Sheer Pullen: the database engine is what handles requests in SQL<br \/>\n[2\/28\/14 4:05:28 PM] (name deleted): are we a modern DB engine?<br \/>\n[2\/28\/14 4:05:30 PM] Sheer Pullen: We are.<br \/>\n[2\/28\/14 4:05:42 PM] Sheer Pullen: We use mysql 5.5, which is a database engine with 20 years of open source work behind it&#8217;s code<br \/>\n[2\/28\/14 4:05:47 PM] Sheer Pullen: I&#8217;ve worked on it a time or two myself<br \/>\n[2\/28\/14 4:05:51 PM] Sheer Pullen: thousands of people have<br \/>\n[2\/28\/14 4:06:15 PM] (name deleted): is there any book you recommend ie sql for dummies?<br \/>\n[2\/28\/14 4:06:25 PM] Sheer Pullen: That w3schools tutorial is a great place to start<br \/>\n[2\/28\/14 4:06:32 PM] Sheer Pullen: it has a SQL engine written in java written right into the page<br \/>\n[2\/28\/14 4:06:37 PM] Sheer Pullen: so you can try things out on the web page<br \/>\n[2\/28\/14 4:06:48 PM] (name deleted): and what is the best doc to have for (clients) schema<br \/>\n[2\/28\/14 4:06:56 PM] Sheer Pullen: *laughs* There isn&#8217;t really one.<br \/>\n[2\/28\/14 4:06:58 PM] (name deleted): saw something called ERD?<br \/>\n[2\/28\/14 4:07:02 PM] (name deleted): what is ERd?<br \/>\n[2\/28\/14 4:07:12 PM] Sheer Pullen: (our DBA) is building ERDs, which are graphical representations of the relationships between tables. ERD stands for Entity Relationship Diagram<br \/>\n[2\/28\/14 4:07:22 PM] Sheer Pullen: and you can get a copy of those and look at them and they will help you<br \/>\n[2\/28\/14 4:07:42 PM] Sheer Pullen: the &#8220;stock&#8221; ofbiz tables are documented in several Ofbiz books, I can send you links if you like<br \/>\n[2\/28\/14 4:07:56 PM] Sheer Pullen: but we have added many tables and relationships only documented in (our DBA&#8217;s) diagrams, or in some places (sadly) nowhere.<br \/>\n[2\/28\/14 4:08:18 PM] Sheer Pullen: However, a lot of those are for pretty exotic concepts<br \/>\nas a beginner, you&#8217;re going to start out asking questions like &#8220;What was the volume for this order, and who placed it&#8221;<br \/>\n[2\/28\/14 4:08:50 PM] Sheer Pullen: Actually, your very first questions will be things like &#8220;What is the name for this party ID&#8221;<br \/>\n[2\/28\/14 4:08:54 PM] (name deleted): how many active (customers) by state<br \/>\n[2\/28\/14 4:09:42 PM] Sheer Pullen: Now, if you are curious, here is how I would ask that question<br \/>\n[2\/28\/14 4:10:32 PM] Sheer Pullen: First, I&#8217;d think about what tables I needed<br \/>\n[2\/28\/14 4:10:41 PM] Sheer Pullen: I&#8217;d know I needed POSTAL_ADDRESS, because duh that&#8217;s where addresses live<br \/>\n[2\/28\/14 4:10:49 PM] Sheer Pullen: I&#8217;d know I needed (custom customer table), because that&#8217;s how I find active consultants<br \/>\n[2\/28\/14 4:11:01 PM] Sheer Pullen: I&#8217;d know i needed PARTY_CONTACT_MECH_PURPOSE, because that ties the two together<br \/>\n[2\/28\/14 4:11:13 PM] Sheer Pullen: then I&#8217;d think about HOW I needed to join the tables together<br \/>\n[2\/28\/14 4:11:44 PM] Sheer Pullen: I&#8217;d know POSTAL_ADDRESS and PARTY_CONTACT_MECH_PURPOSE needed to join on CONTACT_MECH_ID, and additionally that I wanted entries that had a CONTACT_MECH_PURPOSE_TYPE_ID of (a enum), because that&#8217;s where we store their most important address<br \/>\n[2\/28\/14 4:12:34 PM] Sheer Pullen: I&#8217;d know (custom customer table) and PARTY_CONTACT_MECH_PURPOSE needed to join on PARTY_ID, because that&#8217;s the key that describes a unique person like a consultant, and that in addition I only wanted rows from (custom customer table) that had a APPLICATION_STATUS_ID != &#8216;(Customers who are still active enum)&#8217; because I wanted ACTIVE (customers).<br \/>\n[2\/28\/14 4:13:09 PM] Sheer Pullen: In addition, I&#8217;d know I wanted to group the results, because I&#8217;m looking for a particular thing, their state<br \/>\n[2\/28\/14 4:13:15 PM] (name deleted): so first tables<br \/>\n[2\/28\/14 4:13:23 PM] (name deleted): that the where<br \/>\n[2\/28\/14 4:13:29 PM] (name deleted): then the what<br \/>\n[2\/28\/14 4:13:30 PM] Sheer Pullen: *grins*  I&#8217;m just describing the way I think through these things. You might choose a totally different way.<br \/>\n[2\/28\/14 4:13:34 PM] Sheer Pullen: but yes, that&#8217;s exactly right<br \/>\n[2\/28\/14 4:13:37 PM] Sheer Pullen: then the last one is the how<br \/>\n[2\/28\/14 4:13:48 PM] Sheer Pullen: in this particular case, I want them grouped by the state they&#8217;re in, and I want a count of that state<br \/>\nAs far as what order to join in, the true wizard joins for performance, starting with most specific first, when writing something that&#8217;s going to take a lot of database Parenthesis. For a question like that one that&#8217;s pretty easy, I generally just write it out in whatever order it comes to me.<br \/>\n[2\/28\/14 4:35:46 PM] Sheer Pullen: Two more little quick things I want to mention and then I&#8217;ll get back to &#8220;real work&#8221;<br \/>\n[2\/28\/14 4:35:48 PM] Sheer Pullen: 1) AND and OR<br \/>\n[2\/28\/14 4:36:02 PM] Sheer Pullen: these are very important concepts in every type of computer operation you will ever do<br \/>\n[2\/28\/14 4:36:13 PM] Sheer Pullen: AND means both sides are true, OR means one or the other (or both) sides are true<br \/>\n[2\/28\/14 4:36:21 PM] Sheer Pullen: There&#8217;s a third, XOR, which means ONLY one or the other side is true..<br \/>\n[2\/28\/14 4:36:29 PM] Sheer Pullen: but that&#8217;s not very often needed for stuff you&#8217;ll do<br \/>\n[2\/28\/14 4:36:33 PM] Sheer Pullen: and 2) Parenthesis<br \/>\n[2\/28\/14 4:36:53 PM] Sheer Pullen: As you probably learned in high school math, there is a order of operations which every equation.. which a SQL query is a type of.. will be evaluated on<br \/>\n[2\/28\/14 4:37:07 PM] Sheer Pullen: The way to bypass this is by placing terms in parenthesis<br \/>\n[2\/28\/14 4:37:24 PM] Sheer Pullen: if you are getting results you didn&#8217;t expect, you might add parens to expressly state the order of operations you want<br \/>\n[2\/28\/14 4:37:29 PM] Sheer Pullen: this is just like algebra..<br \/>\n[2\/28\/14 4:38:02 PM] Sheer Pullen: i.e. in algebra the OO is PEMDAS &#8211; where the P is parens, forcing anything in parenthesis to be evaluated first<br \/>\n[2\/28\/14 4:38:19 PM] Sheer Pullen: and otherwise it would be exponents, multiplication, division, addition, subtraction<br \/>\n[2\/28\/14 4:38:28 PM] Sheer Pullen: but if you want subtraction to be evaluated first you throw it in parens<br \/>\n[2\/28\/14 4:38:39 PM] Sheer Pullen: SQL is the exact same way, as is every high level programming language I know of<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently gave this explanation of SQL to a beginner, and I thought it was good enough to save for posterity. This is from a Skype Chat. [2\/28\/14 3:56:05 PM] Sheer Pullen: The real thing to understand is that SQL is just a very formal way of asking questions [2\/28\/14 3:56:42 PM] Sheer Pullen: and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/posts\/2356"}],"collection":[{"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/comments?post=2356"}],"version-history":[{"count":1,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/posts\/2356\/revisions"}],"predecessor-version":[{"id":2357,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/posts\/2356\/revisions\/2357"}],"wp:attachment":[{"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/media?parent=2356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/categories?post=2356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sheer.us\/weblogs\/wp-json\/wp\/v2\/tags?post=2356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}