Left, Down, Page Down | Next slide |
---|---|
Right, Up, Page Up | Previous slide |
Space | Forward |
Group Study in Shanghai Jiao Tong University
- Data Collection Methods Summary - Web Data Introduction - Data Collection with AJAX - My WeChat Research
## Data Collection Methods Summary - The Traditional Methods + Questionnaire / Survey + Interview + Observation + Experiment - The Tech Methods + The Online Version of Above if Applicable + Online Observation Experiment + Data API Enquiry (数据接口取数) + Web Page Snapshot and Extraction with Scripts (网页爬虫) - Third-party Provided Data Personally, I prefer data collected via online observation experiment. When cost is taken into account, I prefer to use web crawler techniques to collect data.
## Web Data Introduction - TL;DR In general, a web page we view is the result of following processes. - User Request through a Web Browser (such as Chrome, Safari, IE, Edge and so on) + The User Request mainly contains URL(protocol+host/domain name+path+parameters), Device IP, Cookies, Browser UserAgent and more Browser Information Data - Web Server (such as Nginx, Apache and so on) recieves User Request and does some jobs + may write logs about User Request, which records data for sure + may rewrite user requested URL + may forward User Request to backend web process - Backend web process recieves the forwarded User Request and renders the web page
## Web Data Introduction - Front-end, user sensible, presented by web browser + HTML + CSS + JavaScript, AJAX - Back-end + API + Web Page Generators, Controllers, Template, ... + Database (MySQL, MongoBD, Files) - Third-Party + WeChat Web OAuth + WeChat JSSDK
## Data Collection with AJAX ### Ajax (also AJAX /ˈeɪdʒæks/; short for "Asynchronous JavaScript + XML") Concept - [Ajax (programming)](https://en.wikipedia.org/wiki/Ajax_(programming)) - [How does AJAX work?](https://stackoverflow.com/questions/1510011/how-does-ajax-work) ### Real Demo - [Data from Ajax with jQuery and PHP](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.html) - [Data from User Request](http://demo.johannhuang.com/php/2018/04/ajax-with-jquery-and-php/index.php)
## What Knowledge Needed? - JavaScript, especially AJAX part - HTML + CSS, better to know, not necessarily - PHP, especially parts related to retrive data and write data into database / data files - SQL, if you choose to use relationship database - Nginx, to deploy your code
## My WeChat Research - Database Tables All data tables used. mysql> show tables; +-----------------------+ | Tables_in_research_v0 | +-----------------------+ | apiauth | | articles | | pageviews | | pvlines | | users | +-----------------------+ 5 rows in set (0.00 sec)
## My WeChat Research - apiauth Tickets to get data from WeChat API mysql> select * from apiauth; +--------------+--------+-------------+ | type | value | expire_time | +--------------+--------+-------------+ | access_token | ... | 1491734629 | | jsapi_ticket | ... | 1491734629 | +--------------+--------+-------------+ 2 rows in set (0.00 sec)
## My WeChat Research - articles Data in this table decides the contents shown in the web page. mysql> desc articles; +--------------+--------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+-------------------+-----------------------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | title | varchar(128) | YES | | NULL | | | description | varchar(256) | YES | | NULL | | | imgurl | varchar(512) | YES | | NULL | | | link | varchar(512) | YES | | NULL | | | contenttitle | varchar(128) | YES | | NULL | | | copyright | int(8) | YES | | 0 | | | date | date | YES | | NULL | | | author | varchar(16) | YES | | NULL | | | content | longtext | YES | | NULL | | | originallink | varchar(512) | YES | | NULL | | | pageid | varchar(32) | YES | | NULL | | | pagecreator | varchar(32) | YES | | NULL | | | createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | +--------------+--------------+------+-----+-------------------+-----------------------------+ 14 rows in set (0.00 sec)
## My WeChat Research - users Data recieved by calling WeChat web auth api using `snsapi_userinfo` scope which would notify the current visitor. mysql> desc users; +-------------+--------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+-------------------+-----------------------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | superid | varchar(64) | YES | | NULL | | | email | varchar(64) | YES | | NULL | | | phone | int(11) | YES | | NULL | | | userid | varchar(16) | YES | | NULL | | | displayname | varchar(16) | YES | | NULL | | | name | varchar(16) | YES | | NULL | | | address | varchar(128) | YES | | NULL | | | source | varchar(32) | YES | | NULL | | | unionid | varchar(64) | YES | | NULL | | | openid | varchar(64) | YES | | NULL | | | nickname | varchar(32) | YES | | NULL | | | sex | int(11) | YES | | NULL | | | language | varchar(8) | YES | | NULL | | | city | varchar(32) | YES | | NULL | | | province | varchar(32) | YES | | NULL | | | country | varchar(32) | YES | | NULL | | | headimgurl | varchar(512) | YES | | NULL | | | privilege | varchar(64) | YES | | NULL | | | storeimgurl | varchar(128) | YES | | NULL | | | remarks | varchar(512) | NO | | NULL | | | createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | +-------------+--------------+------+-----+-------------------+-----------------------------+ 22 rows in set (0.01 sec)
## My WeChat Research - pageviews Data collected by extracting User Browser Request or API Request. mysql> desc pageviews; +------------------+--------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-------------------+-----------------------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | page | varchar(128) | YES | | NULL | | | parameters | varchar(128) | YES | | NULL | | | comefrom | varchar(64) | YES | | NULL | | | visitby | varchar(64) | YES | | NULL | | | visittime | datetime | YES | | NULL | | | iplist | varchar(64) | YES | | NULL | | | realip | varchar(16) | YES | | NULL | | | iplocation | varchar(64) | YES | | NULL | | | useragent | varchar(256) | YES | | NULL | | | device | varchar(20) | YES | | NULL | | | browser | varchar(20) | YES | | NULL | | | referrer | varchar(512) | YES | | NULL | | | networktype | varchar(10) | YES | | NULL | | | location | varchar(40) | YES | | NULL | | | txaddr | varchar(512) | YES | | NULL | | | bdaddr | varchar(512) | YES | | NULL | | | readingtime | int(11) | YES | | NULL | | | sharetimeline | datetime | YES | | NULL | | | shareappmessage | datetime | YES | | NULL | | | leavetime | datetime | YES | | NULL | | | rtclocalprivate | varchar(32) | YES | | NULL | | | rtclocalpublic | varchar(64) | YES | | NULL | | | rtclocalipv6 | varchar(128) | YES | | NULL | | | rtclocation | varchar(128) | YES | | NULL | | | remarks | varchar(512) | YES | | NULL | | | sessionid | varchar(128) | YES | | NULL | | | sessionid_qrcode | varchar(128) | YES | | NULL | | | createtime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | +------------------+--------------+------+-----+-------------------+-----------------------------+ 29 rows in set (0.00 sec)
## My WeChat Research - pvlines Data collected via API. mysql> desc pvlines; +--------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------+----------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | pvid | int(11) | YES | | NULL | | | lineid | int(11) | YES | | NULL | | | state | int(11) | YES | | NULL | | | time | datetime | YES | | NULL | | +--------+----------+------+-----+---------+----------------+ 5 rows in set (0.00 sec)
- [HTTP (HyperText Transfer Protocol)](https://www.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html) - [What is Ajax?](https://www.ibm.com/support/knowledgecenter/SSRTLW_8.5.1/com.ibm.etools.webtoolscore.doc/topics/cajax.html)
Johann Huang