cat /tmp/foo | awk '{sum+=$1}END{print sum}'
(From Mike Masters, of course)
Monday, November 14, 2011
Friday, November 11, 2011
How to output the first line of each file in a directory
$ head -n 1 *
Sample output:
$ head -n 1 * ==> Desktop <== ==> Development <== ==> Documents <== ==> Downloads <== ==> Dropbox <== ==> Environment <== ==> Library <== ==> Movies <== ==> Music <== ==> Pictures <== ==> Public <== ==> Sites <== ==> bin <== ==> current.html <==Current IP Check Current IP Address: 63.119.11.19 ==> databases <== ==> my.cnf <== [client]
(That's the directory structure of my home dir on my work machine.)
Wednesday, November 2, 2011
Notes on GROUP BY in MySQL
Here is a query I wanted to run, but I was concerned that the value of fat.rowsprocessed would not come from the same fat row as min(fat.processeddate).
My buddy Spencer pointed out that in standard SQL, if you use an aggregate function, then you have to include all the other fields you are selecting in the group by. It turns out that there is an extension to GROUP BY, and to HAVING, in MySQL that enables you to use them on a single field:
I was afraid that the db would pick any value of rowsprocessed, that it would not come from the same row that the min(processeddate) is from.
HAVING is what I wanted to use. It's also not standard SQL legal but the same MySQL extension enables this.
I ran both, exported the csv's, and diff'd them, and they gave identical results. But I suspect that was luck in this case, and that the first query would not be dependably unarbitrary. I am more comfortable with the second query, using HAVING.
select m.domain, ma.merchantacctid, ma.createddate, fat.rowsprocessed, min(fat.processeddate) from merchant_account ma join merchant m on m.merchantacctid = ma.merchantacctid join ftp_audit_trail fat on fat.merchantacctid = ma.merchantacctid where fat.processeddate > ma.createddate and fat.rowsprocessed > 0 and ma.createddate > '2009-12-31' group by fat.merchantacctid order by domain;
My buddy Spencer pointed out that in standard SQL, if you use an aggregate function, then you have to include all the other fields you are selecting in the group by. It turns out that there is an extension to GROUP BY, and to HAVING, in MySQL that enables you to use them on a single field:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
11.15.3. GROUP BY and HAVING with Hidden Columns
I was afraid that the db would pick any value of rowsprocessed, that it would not come from the same row that the min(processeddate) is from.
select m.domain, ma.merchantacctid, ma.createddate, fat.rowsprocessed, fat.processeddate from merchant_account ma join merchant m on m.merchantacctid = ma.merchantacctid join ftp_audit_trail fat on fat.merchantacctid = ma.merchantacctid where fat.processeddate > ma.createddate and fat.rowsprocessed > 0 and ma.createddate > '2009-12-31' group by fat.merchantacctid having min(fat.processeddate) order by domain;
HAVING is what I wanted to use. It's also not standard SQL legal but the same MySQL extension enables this.
I ran both, exported the csv's, and diff'd them, and they gave identical results. But I suspect that was luck in this case, and that the first query would not be dependably unarbitrary. I am more comfortable with the second query, using HAVING.
Subscribe to:
Posts (Atom)