| Left, Down, Page Down | Next slide |
|---|---|
| Right, Up, Page Up | Previous slide |
| Space | Forward |
Group Study in Shanghai Jiao Tong University
- Data Collection Methods Summary
- Web Data Introduction
- Data Collection with AJAX
- My WeChat Research
## Data Collection Methods Summary
- The Traditional Methods
+ Questionnaire / Survey
+ Interview
+ Observation
+ Experiment
- The Tech Methods
+ The Online Version of Above if Applicable
+ Online Observation Experiment
+ Data API Enquiry (数据接口取数)
+ Web Page Snapshot and Extraction with Scripts (网页爬虫)
- Third-party Provided Data
Personally, I prefer data collected via online observation experiment. When cost is taken into account, I prefer to use web crawler techniques to collect data.
## Web Data Introduction - TL;DR
In general, a web page we view is the result of following processes.
- User Request through a Web Browser (such as Chrome, Safari, IE, Edge and so on)
+ The User Request mainly contains URL(protocol+host/domain name+path+parameters), Device IP, Cookies, Browser UserAgent and more Browser Information Data
- Web Server (such as Nginx, Apache and so on) recieves User Request and does some jobs
+ may write logs about User Request, which records data for sure
+ may rewrite user requested URL
+ may forward User Request to backend web process
- Backend web process recieves the forwarded User Request and renders the web page
## Web Data Introduction
- Front-end, user sensible, presented by web browser
+ HTML
+ CSS
+ JavaScript, AJAX
- Back-end
+ API
+ Web Page Generators, Controllers, Template, ...
+ Database (MySQL, MongoBD, Files)
- Third-Party
+ WeChat Web OAuth
+ WeChat JSSDK
## Data Collection with AJAX
### Ajax (also AJAX /ˈeɪdʒæks/; short for "Asynchronous JavaScript + XML") Concept
- [Ajax (programming)](https://en.wikipedia.org/wiki/Ajax_(programming))
- [How does AJAX work?](https://stackoverflow.com/questions/1510011/how-does-ajax-work)
### Real Demo
- [Data from Ajax with jQuery and PHP](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.html)
- [Data from User Request](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.php)
## What Knowledge Needed?
- JavaScript, especially AJAX part
- HTML + CSS, better to know, not necessarily
- PHP, especially parts related to retrive data and write data into database / data files
- SQL, if you choose to use relationship database
- Nginx, to deploy your code
## My WeChat Research - Database Tables
All data tables used.
mysql> show tables;
+-----------------------+
| Tables_in_research_v0 |
+-----------------------+
| apiauth |
| articles |
| pageviews |
| pvlines |
| users |
+-----------------------+
5 rows in set (0.00 sec)
## My WeChat Research - apiauth
Tickets to get data from WeChat API
mysql> select * from apiauth;
+--------------+--------+-------------+
| type | value | expire_time |
+--------------+--------+-------------+
| access_token | ... | 1491734629 |
| jsapi_ticket | ... | 1491734629 |
+--------------+--------+-------------+
2 rows in set (0.00 sec)
## My WeChat Research - articles
Data in this table decides the contents shown in the web page.
mysql> desc articles;
+--------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(128) | YES | | NULL | |
| description | varchar(256) | YES | | NULL | |
| imgurl | varchar(512) | YES | | NULL | |
| link | varchar(512) | YES | | NULL | |
| contenttitle | varchar(128) | YES | | NULL | |
| copyright | int(8) | YES | | 0 | |
| date | date | YES | | NULL | |
| author | varchar(16) | YES | | NULL | |
| content | longtext | YES | | NULL | |
| originallink | varchar(512) | YES | | NULL | |
| pageid | varchar(32) | YES | | NULL | |
| pagecreator | varchar(32) | YES | | NULL | |
| createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+--------------+------+-----+-------------------+-----------------------------+
14 rows in set (0.00 sec)
## My WeChat Research - users
Data recieved by calling WeChat web auth api using `snsapi_userinfo` scope which would notify the current visitor.
mysql> desc users;
+-------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| superid | varchar(64) | YES | | NULL | |
| email | varchar(64) | YES | | NULL | |
| phone | int(11) | YES | | NULL | |
| userid | varchar(16) | YES | | NULL | |
| displayname | varchar(16) | YES | | NULL | |
| name | varchar(16) | YES | | NULL | |
| address | varchar(128) | YES | | NULL | |
| source | varchar(32) | YES | | NULL | |
| unionid | varchar(64) | YES | | NULL | |
| openid | varchar(64) | YES | | NULL | |
| nickname | varchar(32) | YES | | NULL | |
| sex | int(11) | YES | | NULL | |
| language | varchar(8) | YES | | NULL | |
| city | varchar(32) | YES | | NULL | |
| province | varchar(32) | YES | | NULL | |
| country | varchar(32) | YES | | NULL | |
| headimgurl | varchar(512) | YES | | NULL | |
| privilege | varchar(64) | YES | | NULL | |
| storeimgurl | varchar(128) | YES | | NULL | |
| remarks | varchar(512) | NO | | NULL | |
| createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------+--------------+------+-----+-------------------+-----------------------------+
22 rows in set (0.01 sec)
## My WeChat Research - pageviews
Data collected by extracting User Browser Request or API Request.
mysql> desc pageviews;
+------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| page | varchar(128) | YES | | NULL | |
| parameters | varchar(128) | YES | | NULL | |
| comefrom | varchar(64) | YES | | NULL | |
| visitby | varchar(64) | YES | | NULL | |
| visittime | datetime | YES | | NULL | |
| iplist | varchar(64) | YES | | NULL | |
| realip | varchar(16) | YES | | NULL | |
| iplocation | varchar(64) | YES | | NULL | |
| useragent | varchar(256) | YES | | NULL | |
| device | varchar(20) | YES | | NULL | |
| browser | varchar(20) | YES | | NULL | |
| referrer | varchar(512) | YES | | NULL | |
| networktype | varchar(10) | YES | | NULL | |
| location | varchar(40) | YES | | NULL | |
| txaddr | varchar(512) | YES | | NULL | |
| bdaddr | varchar(512) | YES | | NULL | |
| readingtime | int(11) | YES | | NULL | |
| sharetimeline | datetime | YES | | NULL | |
| shareappmessage | datetime | YES | | NULL | |
| leavetime | datetime | YES | | NULL | |
| rtclocalprivate | varchar(32) | YES | | NULL | |
| rtclocalpublic | varchar(64) | YES | | NULL | |
| rtclocalipv6 | varchar(128) | YES | | NULL | |
| rtclocation | varchar(128) | YES | | NULL | |
| remarks | varchar(512) | YES | | NULL | |
| sessionid | varchar(128) | YES | | NULL | |
| sessionid_qrcode | varchar(128) | YES | | NULL | |
| createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+--------------+------+-----+-------------------+-----------------------------+
29 rows in set (0.00 sec)
## My WeChat Research - pvlines
Data collected via API.
mysql> desc pvlines;
+--------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| pvid | int(11) | YES | | NULL | |
| lineid | int(11) | YES | | NULL | |
| state | int(11) | YES | | NULL | |
| time | datetime | YES | | NULL | |
+--------+----------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
- [HTTP (HyperText Transfer Protocol)](https://www.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html)
- [What is Ajax?](https://www.ibm.com/support/knowledgecenter/SSRTLW_8.5.1/com.ibm.etools.webtoolscore.doc/topics/cajax.html)
Johann Huang