Left, Down, Page DownNext slide
Right, Up, Page UpPrevious slide
SpaceForward
Data Collection with AJAX

Group Study in Shanghai Jiao Tong University

2018-04-26 / Johann Huang
Contents
- Data Collection Methods Summary
- Web Data Introduction
- Data Collection with AJAX
- My WeChat Research
        
## Data Collection Methods Summary

- The Traditional Methods
	+ Questionnaire / Survey
	+ Interview
	+ Observation
	+ Experiment
- The Tech Methods
	+ The Online Version of Above if Applicable
	+ Online Observation Experiment
	+ Data API Enquiry (数据接口取数)
	+ Web Page Snapshot and Extraction with Scripts (网页爬虫)
- Third-party Provided Data

Personally, I prefer data collected via online observation experiment. When cost is taken into account, I prefer to use web crawler techniques to collect data.
      
## Web Data Introduction - TL;DR

In general, a web page we view is the result of following processes.

- User Request through a Web Browser (such as Chrome, Safari, IE, Edge and so on)
	+ The User Request mainly contains URL(protocol+host/domain name+path+parameters), Device IP, Cookies, Browser UserAgent and more Browser Information Data
- Web Server (such as Nginx, Apache and so on) recieves User Request and does some jobs
	+ may write logs about User Request, which records data for sure
	+ may rewrite user requested URL
	+ may forward User Request to backend web process
- Backend web process recieves the forwarded User Request and renders the web page
      
## Web Data Introduction

- Front-end, user sensible, presented by web browser
	+ HTML
	+ CSS
	+ JavaScript, AJAX
- Back-end
	+ API
	+ Web Page Generators, Controllers, Template, ...
	+ Database (MySQL, MongoBD, Files)
- Third-Party
	+ WeChat Web OAuth
	+ WeChat JSSDK
      
## Data Collection with AJAX

### Ajax (also AJAX /ˈeɪdʒæks/; short for "Asynchronous JavaScript + XML") Concept

- [Ajax (programming)](https://en.wikipedia.org/wiki/Ajax_(programming))
- [How does AJAX work?](https://stackoverflow.com/questions/1510011/how-does-ajax-work)


### Real Demo

- [Data from Ajax with jQuery and PHP](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.html)
- [Data from User Request](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.php)
      
## What Knowledge Needed?

- JavaScript, especially AJAX part
- HTML + CSS, better to know, not necessarily
- PHP, especially parts related to retrive data and write data into database / data files
- SQL, if you choose to use relationship database
- Nginx, to deploy your code
      
## My WeChat Research - Database Tables

All data tables used.

    mysql> show tables;
    +-----------------------+
    | Tables_in_research_v0 |
    +-----------------------+
    | apiauth               |
    | articles              |
    | pageviews             |
    | pvlines               |
    | users                 |
    +-----------------------+
    5 rows in set (0.00 sec)
      
## My WeChat Research - apiauth

Tickets to get data from WeChat API

    mysql> select * from apiauth;
    +--------------+--------+-------------+
    | type         | value	| expire_time |
    +--------------+--------+-------------+
    | access_token | ... 	|  1491734629 |
    | jsapi_ticket | ... 	|  1491734629 |
    +--------------+--------+-------------+
    2 rows in set (0.00 sec)
      
## My WeChat Research - articles

Data in this table decides the contents shown in the web page.

    mysql> desc articles;
    +--------------+--------------+------+-----+-------------------+-----------------------------+
    | Field        | Type         | Null | Key | Default           | Extra                       |
    +--------------+--------------+------+-----+-------------------+-----------------------------+
    | id           | int(11)      | NO   | PRI | NULL              | auto_increment              |
    | title        | varchar(128) | YES  |     | NULL              |                             |
    | description  | varchar(256) | YES  |     | NULL              |                             |
    | imgurl       | varchar(512) | YES  |     | NULL              |                             |
    | link         | varchar(512) | YES  |     | NULL              |                             |
    | contenttitle | varchar(128) | YES  |     | NULL              |                             |
    | copyright    | int(8)       | YES  |     | 0                 |                             |
    | date         | date         | YES  |     | NULL              |                             |
    | author       | varchar(16)  | YES  |     | NULL              |                             |
    | content      | longtext     | YES  |     | NULL              |                             |
    | originallink | varchar(512) | YES  |     | NULL              |                             |
    | pageid       | varchar(32)  | YES  |     | NULL              |                             |
    | pagecreator  | varchar(32)  | YES  |     | NULL              |                             |
    | createtime   | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    +--------------+--------------+------+-----+-------------------+-----------------------------+
    14 rows in set (0.00 sec)
      
## My WeChat Research - users

Data recieved by calling WeChat web auth api using `snsapi_userinfo` scope which would notify the current visitor.

    mysql> desc users;
    +-------------+--------------+------+-----+-------------------+-----------------------------+
    | Field       | Type         | Null | Key | Default           | Extra                       |
    +-------------+--------------+------+-----+-------------------+-----------------------------+
    | id          | int(11)      | NO   | PRI | NULL              | auto_increment              |
    | superid     | varchar(64)  | YES  |     | NULL              |                             |
    | email       | varchar(64)  | YES  |     | NULL              |                             |
    | phone       | int(11)      | YES  |     | NULL              |                             |
    | userid      | varchar(16)  | YES  |     | NULL              |                             |
    | displayname | varchar(16)  | YES  |     | NULL              |                             |
    | name        | varchar(16)  | YES  |     | NULL              |                             |
    | address     | varchar(128) | YES  |     | NULL              |                             |
    | source      | varchar(32)  | YES  |     | NULL              |                             |
    | unionid     | varchar(64)  | YES  |     | NULL              |                             |
    | openid      | varchar(64)  | YES  |     | NULL              |                             |
    | nickname    | varchar(32)  | YES  |     | NULL              |                             |
    | sex         | int(11)      | YES  |     | NULL              |                             |
    | language    | varchar(8)   | YES  |     | NULL              |                             |
    | city        | varchar(32)  | YES  |     | NULL              |                             |
    | province    | varchar(32)  | YES  |     | NULL              |                             |
    | country     | varchar(32)  | YES  |     | NULL              |                             |
    | headimgurl  | varchar(512) | YES  |     | NULL              |                             |
    | privilege   | varchar(64)  | YES  |     | NULL              |                             |
    | storeimgurl | varchar(128) | YES  |     | NULL              |                             |
    | remarks     | varchar(512) | NO   |     | NULL              |                             |
    | createtime  | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    +-------------+--------------+------+-----+-------------------+-----------------------------+
    22 rows in set (0.01 sec)
      
## My WeChat Research - pageviews

Data collected by extracting User Browser Request or API Request.

    mysql> desc pageviews;
    +------------------+--------------+------+-----+-------------------+-----------------------------+
    | Field            | Type         | Null | Key | Default           | Extra                       |
    +------------------+--------------+------+-----+-------------------+-----------------------------+
    | id               | int(11)      | NO   | PRI | NULL              | auto_increment              |
    | page             | varchar(128) | YES  |     | NULL              |                             |
    | parameters       | varchar(128) | YES  |     | NULL              |                             |
    | comefrom         | varchar(64)  | YES  |     | NULL              |                             |
    | visitby          | varchar(64)  | YES  |     | NULL              |                             |
    | visittime        | datetime     | YES  |     | NULL              |                             |
    | iplist           | varchar(64)  | YES  |     | NULL              |                             |
    | realip           | varchar(16)  | YES  |     | NULL              |                             |
    | iplocation       | varchar(64)  | YES  |     | NULL              |                             |
    | useragent        | varchar(256) | YES  |     | NULL              |                             |
    | device           | varchar(20)  | YES  |     | NULL              |                             |
    | browser          | varchar(20)  | YES  |     | NULL              |                             |
    | referrer         | varchar(512) | YES  |     | NULL              |                             |
    | networktype      | varchar(10)  | YES  |     | NULL              |                             |
    | location         | varchar(40)  | YES  |     | NULL              |                             |
    | txaddr           | varchar(512) | YES  |     | NULL              |                             |
    | bdaddr           | varchar(512) | YES  |     | NULL              |                             |
    | readingtime      | int(11)      | YES  |     | NULL              |                             |
    | sharetimeline    | datetime     | YES  |     | NULL              |                             |
    | shareappmessage  | datetime     | YES  |     | NULL              |                             |
    | leavetime        | datetime     | YES  |     | NULL              |                             |
    | rtclocalprivate  | varchar(32)  | YES  |     | NULL              |                             |
    | rtclocalpublic   | varchar(64)  | YES  |     | NULL              |                             |
    | rtclocalipv6     | varchar(128) | YES  |     | NULL              |                             |
    | rtclocation      | varchar(128) | YES  |     | NULL              |                             |
    | remarks          | varchar(512) | YES  |     | NULL              |                             |
    | sessionid        | varchar(128) | YES  |     | NULL              |                             |
    | sessionid_qrcode | varchar(128) | YES  |     | NULL              |                             |
    | createtime       | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    +------------------+--------------+------+-----+-------------------+-----------------------------+
    29 rows in set (0.00 sec)
      
## My WeChat Research - pvlines

Data collected via API.

    mysql> desc pvlines;
    +--------+----------+------+-----+---------+----------------+
    | Field  | Type     | Null | Key | Default | Extra          |
    +--------+----------+------+-----+---------+----------------+
    | id     | int(11)  | NO   | PRI | NULL    | auto_increment |
    | pvid   | int(11)  | YES  |     | NULL    |                |
    | lineid | int(11)  | YES  |     | NULL    |                |
    | state  | int(11)  | YES  |     | NULL    |                |
    | time   | datetime | YES  |     | NULL    |                |
    +--------+----------+------+-----+---------+----------------+
    5 rows in set (0.00 sec)
      
References
- [HTTP (HyperText Transfer Protocol)](https://www.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html)
- [What is Ajax?](https://www.ibm.com/support/knowledgecenter/SSRTLW_8.5.1/com.ibm.etools.webtoolscore.doc/topics/cajax.html)
        
Questions?
Thanks for your attendance!

Johann Huang