使用Ruby程序实现web信息抓取的教程

2019-09-25 09:45:09王冬梅

执行身份验证后,您可通过访问令牌对象发出 REST 请求。响应是一个典型的 HTTP 响应,所以您可将正文解析为 JSON 对象。然后可迭代该 JSON 对象来提取感兴趣的数据。

清单 4 中的 Ruby 脚本为进行身份验证后的 LinkedIn 用户提供了要关注的公司推荐和工作建议。
清单 4. 使用 LinkedIn API (lkdin.rb) 查看公司和工作建议

#!/usr/bin/ruby
require 'rubygems'
require 'oauth'
require 'json'

pquery = "http://api.linkedin.com/v1/people/~?format=json"
cquery='http://api.linkedin.com/v1/people/~/suggestions/to-follow/companies?format=json'
jquery='http://api.linkedin.com/v1/people/~/suggestions/job-suggestions?format=json'
 
# Fill the keys and secrets you retrieved after registering your app
api_key = 'api key'
api_secret = 'api secret'
user_token = 'user token'
user_secret = 'user secret'
 
# Specify LinkedIn API endpoint
configuration = { :site => 'https://api.linkedin.com' }
 
# Use the API key and secret to instantiate consumer object
consumer = OAuth::Consumer.new(api_key, api_secret, configuration)
 
# Use the developer token and secret to instantiate access token object
access_token = OAuth::AccessToken.new(consumer, user_token, user_secret)

# Get the username for this profile
response = access_token.get(pquery)
jresp = JSON.parse(response.body)
myName = "#{jresp['firstName']} #{jresp['lastName']}"
puts "nSuggested companies to follow for #{myName}"

# Get the suggested companies to follow
response = access_token.get(cquery)
jresp = JSON.parse(response.body)

# Iterate through each and display the company name
jresp['values'].each do | company |
  puts " #{company['name']}"
end

# Get the job suggestions
response = access_token.get(jquery)
jresp = JSON.parse(response.body)
puts "nSuggested jobs for #{myName}"

# Iterate through each suggested job and print the company name
jresp['jobs']['values'].each do | job |
  puts " #{job['company']['name']} in #{job['locationDescription']}"
end

puts "n"

清单 5 中的控制台会话显示了运行 清单 4 中的 Ruby 脚本的输出。脚本中对 LinkedIn API 的 3 次独立调用有不同的输出结果(一个用于身份验证,其他两个分别用于公司建议和工作建议链接)。
清单 5. 演示 LinkedIn Ruby 脚本

$ ./lkdin.rb

Suggested companies to follow for M. Tim Jones
 Open Kernel Labs, Inc.
 Linaro
 Wind River
 DDC-I
 Linsyssoft Technologies
 Kalray
 American Megatrends
 JetHead Development
 Evidence Srl
 Aizyc Technology

Suggested jobs for M. Tim Jones
 Kozio in Greater Denver Area
 Samsung Semiconductor Inc in San Jose, CA
 Terran Systems in Sunnyvale, CA
 Magnum Semiconductor in San Francisco Bay Area
 RGB Spectrum in Alameda, CA
 Aptina in San Francisco Bay Area
 CyberCoders in San Francisco, CA
 CyberCoders in Alameda, CA
 SanDisk in Longmont, CO
 SanDisk in Longmont, CO

$

可将 LinkedIn API 与任何提供了 OAuth 支持的语言结合使用。