分類
程序

下載指定版本 Chromium 與其 Webdriver

以下載 Linux 平臺 119 版本 Chromium 與其 Webdriver 爲例:

首先在 Chromium 版本發佈頁 找到想要下載版本的 Branch Base Position 比如 1204232。然後在 https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html?prefix=Linux_x64/ 中搜索 1204232。如果發現搜索結果爲空,不用怕,把最後一位去掉試試,這時就會搜到 1204234。也可以再刪掉一位,就會出來更多相近的結果。點進去後下載 chrome-linux.zip 和 chromedriver_linux64.zip 就可以了。

unzip chrome-linux.zip
cd chrome-linux
#查看 Chrome 版本
./chrome --version
#Chromium 119.0.6045.0

在 Selenium 中指定 Chrome、Webdriver 與 Profile 位置

import os
import 
import datetime
import undetected_chromedriver as uc

MAX_RUNTIME_SECONDS = 240
# Check for and kill any Chrome processes that have been running for too long
for proc in psutil.process_iter():
    try:
        if "chrome" in proc.name().lower():
            create_time = proc.create_time()
            elapsed_time = time.time() - create_time
            if elapsed_time > MAX_RUNTIME_SECONDS:
                proc.kill()
                print("killed" + proc.name())
    except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
        pass

script_dir = os.path.dirname(os.path.realpath(__file__))
profile_dir = os.path.join(script_dir, "chrome_profile")
path_to_chrome_binary = os.path.abspath(
    script_dir + "/../../" + "chrome-linux64/chrome"
)
driver_executable_path = os.path.abspath(
    script_dir + "/../../" + "chromedriver-linux64/chromedriver"
)
proxy = "socks5://127.0.0.1:8080"

options = uc.ChromeOptions()
options.add_argument("--disable-notifications")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-save-password-bubble")

# Limit CPU and memory usage
options.add_argument("--disable-software-rasterizer")
options.add_argument("--disable-extensions")
# options.add_argument("--disable-webgl")

options.add_argument(proxy)
options.binary_location = path_to_chrome_binary

# if os.getenv("DEBUG") != "True":
#     options.add_argument("headless=True")

prefs = {
    "credentials_enable_service": False,
    "profile.password_manager_enabled": False,
    "profile.privacy_sandbox_prompt_enabled": False,
    # "profile.managed_default_content_settings.images": 1,
}
options.add_experimental_option("prefs", prefs)

try:
    driver = uc.Chrome(
        executable_path=driver_executable_path,
        browser_executable_path=path_to_chrome_binary,
        options=options,
        version_main=119,
        user_data_dir=profile_dir,
        # use_subprocess=False,
    )
except Exception as e:
    return None


driver.set_window_size(1366, 768)
driver.set_page_load_timeout(30)

#fix timeout bug
try:
    driver.get(url)
except Exception as e:
    try:
        #send Esc key to stop loading
        driver.find_element(By.XPATH, '//body').send_keys(Keys.ESCAPE)
        time.sleep(1)
        # check page loaded
        WebDriverWait(driver, 10).until(
                EC.visibility_of_element_located(
                    (By.XPATH, "//dl[@class='dfn']")
                )
            )
    except Exception as e:
        #take a screen shot
        png_name = datetime.datetime.now().isoformat()[:19]
        driver.save_screenshot(png_name + "_load" + ".png")
        return None
分類
软件

Paseo 步數記錄軟件

散步是人生中的大事,而 Paseo 可以記錄與督促它。祝大家都有一個健康的體魄。

上個月,F-Droid 被 DNS 污染,這個月遭遇不同程度屏蔽。

分類
程序

使用 Cloudflare Worker 中轉 HTTP 請求

Cloudflare Worker 可以方便的中轉 HTTP 請求,下面示例是我之前用過的,算是密碼保護的中轉特定請求。其中的 X_Custom_PSK 算是密碼,在 Settings > Variables 設置,這樣就只有我的程序可以請求。

addEventListener("fetch", event => {

  const psk = event.request.headers.get("X_Custom_PSK");
  if (psk === X_Custom_PSK) {
    event.respondWith(handleRequest(event.request));
  }else{
    const failed_response = new Response('Sorry, you have supplied an invalid key.', {
      status: 403,
    });
    event.respondWith(failed_response);
  }
})

async function handleRequest(request) {
  const resp = await fetch('https://domain.ltd/checkpoint/list', {
      method: 'POST',
      headers: {
          'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.3 Safari/537.36',
          'Accept': '*/*',
          'Accept-Language': 'zh-CN;q=0.8,zh;q=0.6,en-US;q=0.4,en;q=0.2',
          'Origin': 'https://domain.ltd',
          'DNT': '1',
          'Connection': 'keep-alive',
          'Referer': 'https://domain.ltd/',
          'Sec-Fetch-Dest': 'empty',
          'Sec-Fetch-Mode': 'cors',
          'Sec-Fetch-Site': 'cross-site'
      },
      body: new URLSearchParams({
          'page': '1',
          'pageSize': '10',
          'latitude': '22.5',
          'longitude': '114.0',
          'queryType': '0'
      })
  });
  return resp;
}

下面這個則是用一個 worker 代理多個網站。

addEventListener("fetch", event => {
  let url=new URL(event.request.url);
  if (event.request.url.indexOf('shaman')>-1){
      url.hostname="ft.shaman.eu.org";
  }else{
      url.hostname="www.rfi.fr";
  }  
  let request=new Request(url,event.request);
  event.respondWith(fetch(request));
});
分類
程序

用Python抓取大衆點評的用戶評論

大衆點評的知识产权声明可真是霸道啊!還是自己先保存一份。下面代碼先將評論及商戶保存到sqlite數據庫,如果需要還可以導出成CSV,這樣辦公軟件就能直接打開查看了。

from bs4 import BeautifulSoup
import sys,time,random,urllib,http.cookiejar,socket,sqlite3,csv


goOn=1
stopDate=''
UserID=''
review={'shopName':'','shopAddr':'','shopURL':'','reviewURL':'','star':'',
'starDetail':'','costPerPeople':'','reviewText':'','dishes':'','reviewTime':''}

def getHTML(url):
  print("Fetching "+url)
  request = urllib.request.Request(url)
  request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0")
  try:
    response = urllib.request.urlopen(request)
  except (urllib.error.HTTPError, socket.error,urllib.error.URLError) as e:
    print('Connection error occurred when inserting data.'+str(e))
  else:
    if response.code != 200:
      print("Error code:"+response.code)  
    else:
      html = response.read().decode('utf-8')
      return html

def getList(url):
  global review,goOn
  reviewList=getHTML(url)
  soupAll = BeautifulSoup(reviewList).find_all("div",{"class":"txt J_rptlist"})

  for soup in soupAll:
    shopLink = soup.find("a",{"class":"J_rpttitle"})
    review['shopName']=shopLink.text
    review['shopURL']=shopLink.get("href")
    
    shopAddr = soup.find("p",{"class":"col-exp"})
    review['shopAddr']=shopAddr.text
    
    reviewID = soup.find("a",{"class":"J_flower aheart"})
    review['reviewURL']="http://www.dianping.com/review/"+reviewID.get("data-id")
    
    reviewDateDiv = soup.find("div",{"class":"mode-tc info"})
    reviewDateSpan=reviewDateDiv.find("span",{"class":"col-exp"})
    reviewDate=str(reviewDateSpan.text)[3:]
    if(len(reviewDate)==8 and reviewDate>stopDate):
      getReview(review['reviewURL'])
      #抓取頻率
      time.sleep(random.randrange(5,10))
    else:
      goOn=0
  if(goOn==0):
    print("Finished.")
    exit()
    
      
    

def save():
  global review,UserID
  conn = sqlite3.connect('DZDB_'+UserID+'_Reviews.db')
  c = conn.cursor()
  c.execute("""create table if not exists reviews (ID integer primary key not NULL,shopName char(50),shopAddr char(100),shopURL char(100),reviewURL char(100),star char(1),starDetail char(15),costPerPeople char(15),reviewText TEXT,dishes char(100),reviewTime char(20))""")
  s="""insert into reviews (ID,shopName,shopAddr,shopURL,reviewURL,star,starDetail,costPerPeople,reviewText,dishes,reviewTime) VALUES (NULL,\'"""+review['shopName']+'\',\''+review['shopAddr']+'\',\''+review['shopURL']+'\',\''+review['reviewURL']+'\',\''+str(review['star'])+'\',\''+review['starDetail']+'\',\''+review['costPerPeople']+'\',\''+review['reviewText']+'\',\''+review['dishes']+'\',\''+review['reviewTime']+'\')'
  c.execute(s)
  conn.commit()
  c.close
  print("Record at "+review['shopName']+" saved to Datebase.")
  review={'shopName':'','shopAddr':'','shopURL':'','reviewURL':'','star':'',
'starDetail':'','costPerPeople':'','reviewText':'','dishes':'','reviewTime':''}

def getReview(url):
  global review
  reviewHTML=getHTML(url)
  reviewAll=BeautifulSoup(reviewHTML)
  shopInfo= reviewAll.find("ul",{"class":"contList-info"})
  star=str(shopInfo.find("li"))
  if("msstar50" in star):
    review['star']=5
  elif ("msstar40" in star):
    review['star']=4
  elif ("msstar30" in star):
    review['star']=3
  elif ("msstar20" in star):
    review['star']=2
  elif ("msstar10" in star):
    review['star']=1
  else:
    review['star']=0
  starDetails=shopInfo.find_all("span",{"class":"rst"})
  starDetail=""
  for s in starDetails:
    s1=s.text[0:3]
    starDetail=starDetail+s1
  review['starDetail']=starDetail
  
  reviewText= reviewAll.find("div",{"class":"contList-con"})
  review['reviewText']=reviewText.text
  units= reviewAll.find_all("div",{"class":"comment-unit"})
  for unit in units:
    unit=str(unit.text).replace('\n','')
    if("人均:" in unit):    
      review['costPerPeople']=unit[4:]
    elif("喜欢的菜:" in unit): 
      unit=unit.replace(' ','')
      unit=unit.replace('\xa0',' ')
      review['dishes']=unit[7:]
    
  reviewInfo= reviewAll.find("ul",{"class":"contList-fn"})  
  reviewTime=reviewInfo.find("li")
  review['reviewTime']=reviewTime.text
  save() 

def main():
  fun=int(input("请输入数字选择功能:\n[1]抓取数据,[2]导出数据: \n"))
  if(fun==1):
    fetchReview()
  elif(fun==2):
    sqliteToCSV()
  else:
    print("请输入1或2。")

    
def sqliteToCSV():
  dbFile=str(input("请输入数据库文件名:\n"))
  with open(dbFile+'.csv','w+',newline='') as csvfile:
    spamwriter = csv.writer(csvfile)
    conn=sqlite3.connect(dbFile)
    c = conn.cursor()
    spamwriter.writerow(['ID','shopName','shopAddr','shopURL','reviewURL','star',
'starDetail','costPerPeople','reviewText','dishes','reviewTime'])
    for row in c.execute('SELECT * FROM reviews'):
      spamwriter.writerow(row)
    c.close()
    print("CSV文件成功導出。")
    
def fetchReview():
  #抓取参数:用户ID,起始页,结束日期
  global stopDate,UserID
  UserID=str(input("请输入您的大众点评ID,可以在您大众点评主页的网址中看到,如23262500:\n"))
  startPageNo=int(input("开始的页码,如1:\n"))
  stopDate=str(input("请输入评论结束日期(yy-mm-dd),如00-00-00:\n"))
  
  urlBase="http://www.dianping.com/member/"+UserID+"/reviews?pg="
  startPageNo=startPageNo-1
  while(goOn==1):
    startPageNo=startPageNo+1
    getList(urlBase+str(startPageNo))
    
if __name__ == "__main__":
    main()
幾點說明
  • 抓取頻率不要過大,否則大衆點評會屏蔽IP。我在抓取到20頁左右的時候碰到過一次屏蔽IP。如果意外中斷,你可以設置參數繼續下載,附w3school的SQL基礎教程
  • BeautifulSoup真是個好工具,連Qpython3都自帶了,但是遺憾的是這個代碼在Qpython3上跑報NoneType錯誤。
  • 我用了幾次都沒問題。
分類
程序

quick cocos2d win7下环境搭建

所需软件:

  • Win7 x64
  • Quick-Cocos2d-x v3.2-RC1,正常安装即可。
  • JDK,我用的是1.7.0_65,这个最新版应该也可以。安装后设置环境变量,可参考Ubuntu安装SunJava
  • adt-bundle-windows-x86_64-20140702,打开eclipse,帮助-安装新文件填入http://download.eclipse.org/koneki/releases/stable,我是把搜到的都安装了。另外首次创建虚拟机的时候可能要连外网,我用的Psiphon,顺利下载了4.4.4的镜像。有一个很好用的代理mirrors.neusoft.edu.cn:80,速度超快,推荐用这个!记得勾上强制使用http的选项然后重启Android SDK Manager。
  • android-ndk-r9d-windows-x86_64,必须是r9d,因为还不支持r10。SDK 和 NDK 不能放在包含中文和空格的目录中。 SDK/NDK 必须和 quick 的文件放在同一个分区中。请参考编译 Android 工程

安装完成后,打开桌面上的player3,新建项目.选择位置输入包名,选择方向即可。将X:\\cocos\quick-cocos2d-x-3.2rc0\cocos\platform\android\java\src下的org文件夹复制到你项目的src下。也可先将这个导入工作区,然后在自己项目中引用,详细参见Quick-Coco2d-x开发环境搭建。然后运行proj.android文件夹中的build_native.bat。最后打开adt中的eclipse,导入安卓项目,选中proj.android导入。点菜单栏中的运行-运行就可以在虚拟机中调试了。

关于导出apk可参考在eclipse中将android项目生成apk并且给apk签名。需要一提的是,要在项目文件上右键,导出。

本文更新於 2014/10/13。

分類
程序

把照片重命名为拍摄时间

主要问题就是获取照片的EXIF信息中的拍摄时间。Java的话推荐Drew Noakes的metadata-extractor。python3用Pillow就行了。另外还有一个好用的python3库可以获取exif信息,EXIF.py Python3 port,短小精悍可以用在qpython3上。

实现了下java,成品和源码都在https://github.com/pggdt/rename-JPG-to-Date。先基于后缀名判断下是否是jpg文件,然后有exif信息的重命名为“年年月月日日-时时分分秒秒”格式,如果同一秒还有照片就在文件名后加1。如果勾选了重命名为最后修改时间,则会继续重命名没有exif信息的照片,将它们的名字改为图片的最后修改日期。

分類
程序

Eclipse用法

我是在ubuntu12.04上用的adt-bundle,安装好Sun Java后解压就可以使用了。关于ubuntu12.04上安装Sun Java,参考Ubuntu安装SunJava

为Eclipse安装WindowBuilder

In order to install WindowBuilder on Eclipse 4.3 (Kepler) Modeling let’s click on Help -> Install New Software… and choose (into “Work with:” field ) the main Kepler Update Site (http://download.eclipse.org/releases/kepler), expand the Category “General Purpose Tools” (make sure that “Group items by Category” flag be selected) and choose the following items:

  • SWT Designer
  • SWT Designer Core
  • WindowBuilder Core
  • WindowBuilder Core UI
  • WindowBuilder GroupLayout Support
  • WindowBuilder Java Core

Then press the Next button and follow the wizard until it asks to restart Eclipse. Let’s accept pressing Restart Now button. If you download instead the package Eclipse 4.3 (Kepler) for RCP and RAP Developers WindowBuilder is included.