sqlite3学习笔记1

获得数据

从tatoeba获得sentences_detailed.csvlinks.csv两个文件。

建立表格

sadhen@archlinux $ sqlite tatoeba.db
sqlite> CREATE TABLE sentences (id integer primary key, language text, sentence text, user text, date text, modify text);

导入数据

sqlite> .separator "\t"
sqlite> .import sentences_detailed.csv sentences

Note: sqlite 是C语言写的,所以理所当然可以使用C语言的转义符。

分析数据

格式:

select [distinct] heading
from tables
group by columns
having predicate
order by columns
limit count, offset;

统计数据库中中文语句个数(函数)

sqlite> select count(*) from sentences where language='cmn';

Note:

count是函数(Function),是函数中一种聚合(Aggregate)

所谓聚合即关注整体整体的特性,如avg(得到一组数的平均数)。

统计数据库中各语言的个数(分组)

sqlite> select language,count(id) from sentences group by language;

统计语言榜

sqlite> select * from (select language,count(id) from sentences group by language) order by "count(id)" desc limit 10 offset 1;

Note:

  1. select的嵌套,“一切皆表格”。
  2. 原来是这样排序的!! order by
  3. limit 10 就是 top 10
  4. offset 1, 第一名毫无疑问是英语,我们把它忽略掉

句子超过一万的语言们

sqlite> select language,count(id) from sentences group by language having count(id) > 10000;

Note:

注意having和where的作用对象不同之处。