neo4j :rel
几个周末前,我开始抓取拜仁慕尼黑/巴塞罗那比赛的BBC直播文本提要,最初只是从犯规开始,然后建立犯规图表。
从那以后,我花了更多时间,并设法对其他一些事件进行建模,包括尝试,进球,发球和任意球。
我开始只是为了拜仁慕尼黑/巴塞罗那的比赛而做,但是意识到扩大这一范围并为2014/2015冠军联赛的每场比赛绘制图表并不是特别困难。
为此,我们首先需要下载每个匹配项的页面。 我下载了此页面,并编写了一个简单的Python脚本来获取适当的URI:
from bs4 import BeautifulSoup
from soupselect import select
import bs4soup = BeautifulSoup(open("data/results", "r"))matches = select(soup, "a.report")for match in matches:print "http://www.bbc.co.uk/%s" %(match.get("href"))
然后,我将运行此脚本的输出通过管道传递到wget中:
find_all_matches.py | xargs wget -O data/raw
更新抓取和导入代码以处理多个匹配相对简单。 从头到尾的整个过程如下所示:
大部分代码处于“抓取魔术”阶段,在该阶段中,我经历了所有事件并提取了可以在图中链接在一起的适当元素。
例如,任意球和犯规事件通常相邻,因此我们希望淘汰两名参与者,纸牌类型,事件时间和事件发生的比赛。
我使用Python的Beautiful Soup库完成此任务,但是没有理由您不能使用另一套工具。
README页面显示了如何创建您自己的图版本,但是这里是使用Rik的 元图查询查询图外观的概述:
到目前为止,这是我最喜欢的一些查询:
哪个射门次数超过10的球员转换率最高?
match (a:Attempt)<-[:HAD_ATTEMPT]-(app)<-[:MADE_APPEARANCE]-(player),(app)-[:FOR_TEAM]-(team)
WITH player, COUNT(*) as times, COLLECT(a) AS attempts, team
WITH player, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY conversionRate DESC
LIMIT 10==> +------------------------------------------------------------------------------------+
==> | player.name | team.name | goals | times | conversionRate |
==> +------------------------------------------------------------------------------------+
==> | "Luiz Adriano" | "Shakhtar Donetsk" | 9 | 14 | 0.6428571428571429 |
==> | "Yacine Brahimi" | "FC Porto" | 5 | 13 | 0.38461538461538464 |
==> | "Mario Mandzukic" | "Atlético de Madrid" | 5 | 14 | 0.35714285714285715 |
==> | "Sergio Agüero" | "Manchester City" | 6 | 18 | 0.3333333333333333 |
==> | "Karim Benzema" | "Real Madrid" | 6 | 19 | 0.3157894736842105 |
==> | "Klaas-Jan Huntelaar" | "FC Schalke 04" | 5 | 16 | 0.3125 |
==> | "Neymar" | "Barcelona" | 9 | 29 | 0.3103448275862069 |
==> | "Thomas Müller" | "FC Bayern München" | 7 | 24 | 0.2916666666666667 |
==> | "Jackson Martínez" | "FC Porto" | 7 | 24 | 0.2916666666666667 |
==> | "Callum McGregor" | "Celtic" | 3 | 11 | 0.2727272727272727 |
==> +------------------------------------------------------------------------------------+
哪些球员因犯规立即报仇?
match (firstFoul:Foul)-[:COMMITTED_AGAINST]->(app1)<-[:MADE_APPEARANCE]-(revengeFouler),(app1)-[:IN_MATCH]->(match), (firstFoulerApp)-[:COMMITTED_FOUL]->(firstFoul),(app1)-[:COMMITTED_FOUL]->(revengeFoul)-[:COMMITTED_AGAINST]->(firstFoulerApp),(firstFouler)-[:MADE_APPEARANCE]->(firstFoulerApp)
WHERE (firstFoul)-[:NEXT]->(revengeFoul)
RETURN firstFouler.name AS firstFouler, revengeFouler.name AS revengeFouler, firstFoul.time, revengeFoul.time, match.home + " vs " + match.away==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | firstFouler | revengeFouler | firstFoul.time | revengeFoul.time | match.home + " vs " + match.away |
==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | "Derk Boerrigter" | "Jean Philippe Mendy" | "88:48" | "89:42" | "Celtic vs NK Maribor" |
==> | "Mario Suárez" | "Pajtim Kasami" | "27:17" | "32:38" | "Olympiakos vs Atlético de Madrid" |
==> | "Aleksandr Volodko" | "Casemiro" | "39:27" | "44:32" | "FC Porto vs BATE Borisov" |
==> | "Thomas Müller" | "Mario Fernandes" | "87:22" | "88:31" | "CSKA Moscow vs FC Bayern München" |
==> | "Vinicius" | "Marco Verratti" | "56:36" | "58:00" | "APOEL Nicosia vs Paris Saint Germain" |
==> | "Lasse Schöne" | "Dani Alves" | "84:08" | "86:18" | "Barcelona vs Ajax" |
==> | "Nick Viergever" | "Dani Alves" | "57:22" | "60:37" | "Barcelona vs Ajax" |
==> | "Nani" | "Atsuto Uchida" | "6:10" | "8:40" | "FC Schalke 04 vs Sporting Lisbon" |
==> | "Andreas Samaris" | "Yannick Ferreira-Carrasco" | "89:21" | "90:00 +4:21" | "Monaco vs Benfica" |
==> | "Simon Kroon" | "Guillherme Siqueira" | "84:05" | "90:00 +0:29" | "Atlético de Madrid vs Malmö FF" |
==> | "Mario Suárez" | "Isaac Thelin" | "32:02" | "38:47" | "Atlético de Madrid vs Malmö FF" |
==> | "Hakan Balta" | "Henrikh Mkhitaryan" | "62:09" | "64:14" | "Borussia Dortmund vs Galatasaray" |
==> | "Marco Reus" | "Selcuk Inan" | "36:17" | "44:03" | "Borussia Dortmund vs Galatasaray" |
==> | "Hakan Balta" | "Sven Bender" | "10:57" | "12:51" | "Borussia Dortmund vs Galatasaray" |
==> | "Vinicius" | "Edinson Cavani" | "87:56" | "90:00 +1:25" | "Paris Saint Germain vs APOEL Nicosia" |
==> | "Jackson Martínez" | "Carlos Gurpegi" | "64:55" | "66:17" | "Athletic Club vs FC Porto" |
==> | "Nani" | "Chinedu Obasi" | "1:30" | "4:47" | "Sporting Lisbon vs FC Schalke 04" |
==> | "Vitali Rodionov" | "Bruno Martins Indi" | "52:16" | "60:08" | "BATE Borisov vs FC Porto" |
==> | "Raheem Sterling" | "Behrang Safari" | "29:00" | "33:27" | "Liverpool vs FC Basel" |
==> | "Derlis González" | "Fábio Coentrão" | "52:55" | "57:59" | "FC Basel vs Real Madrid" |
==> | "Josip Drmic" | "Lisandro López" | "15:04" | "17:35" | "Benfica vs Bayer 04 Leverkusen" |
==> | "Fred" | "Bastian Schweinsteiger" | "6:04" | "9:28" | "Shakhtar Donetsk vs FC Bayern München" |
==> | "Alex Sandro" | "Derlis González" | "4:07" | "7:28" | "FC Basel vs FC Porto" |
==> | "Luca Zuffi" | "Ruben Neves" | "73:49" | "84:44" | "FC Porto vs FC Basel" |
==> | "Marco Verratti" | "Oscar" | "28:49" | "34:04" | "Chelsea vs Paris Saint Germain" |
==> | "Cristiano Ronaldo" | "Jesús Gámez" | "20:59" | "25:37" | "Real Madrid vs Atlético de Madrid" |
==> | "Bernardo Silva" | "Álvaro Morata" | "49:20" | "62:31" | "Monaco vs Juventus" |
==> | "Arturo Vidal" | "Fabinho" | "38:19" | "45:00" | "Monaco vs Juventus" |
==> +---------------------------------------------------------------------------------------------------------------------------------+
哪些球员花费最长的时间为犯规报仇?
match (foul1:Foul)-[:COMMITTED_AGAINST]->(app1)-[:COMMITTED_FOUL]->(foul2)-[:COMMITTED_AGAINST]->(app2)-[:COMMITTED_FOUL]->(foul1),(player1)-[:MADE_APPEARANCE]->(app1), (player2)-[:MADE_APPEARANCE]->(app2),(foul1)-[:COMMITTED_IN_MATCH]->(match:Match)<-[:COMMITTED_IN_MATCH]-(foul2)
WHERE (foul1)-[:NEXT*]->(foul2)
WITH match, foul1, player1, player2, foul2 ORDER BY foul1.sortableTime, foul2.sortableTime
WITH match, foul1, player1, player2, COLLECT(foul2) AS revenge
WITH match, foul1, player1,player2, revenge[0] AS revengeFoul
RETURN player1.name, player2.name, foul1.time, revengeFoul.time, revengeFoul.sortableTime - foul1.sortableTime AS secondsWaited, match.home + " vs " + match.away AS match
ORDER BY secondsWaited DESC
LIMIT 5==> +---------------------------------------------------------------------------------------------------------------------------+
==> | player1.name | player2.name | foul1.time | revengeFoul.time | secondsWaited | match |
==> +---------------------------------------------------------------------------------------------------------------------------+
==> | "Stefan Johansen" | "Ondrej Duda" | "1:30" | "82:11" | 4841 | "Legia Warsaw vs Celtic" |
==> | "Neymar" | "Vinicius" | "2:35" | "80:08" | 4653 | "Barcelona vs APOEL Nicosia" |
==> | "Jérémy Toulalan" | "Stefan Kießling" | "9:19" | "86:37" | 4638 | "Monaco vs Bayer 04 Leverkusen" |
==> | "Nabil Dirar" | "Domenico Criscito" | "6:32" | "82:39" | 4567 | "Zenit St Petersburg vs Monaco" |
==> | "Nabil Dirar" | "Eliseu" | "7:20" | "81:30" | 4450 | "Monaco vs Benfica" |
==> +---------------------------------------------------------------------------------------------------------------------------+
谁的镜头最多?
match (team)<-[:FOR_TEAM]-(app)<-[appRel:MADE_APPEARANCE]-(player:Player)
optional match (a:Attempt)<-[att:HAD_ATTEMPT]-(app)
WITH player, COUNT( DISTINCT appRel) AS apps, COUNT(att) as times, COLLECT(a) AS attempts, team
WITH player,apps, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, apps, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY times DESC
LIMIT 10==> +-------------------------------------------------------------------------------------------+
==> | player.name | team.name | apps | goals | times | conversionRate |
==> +-------------------------------------------------------------------------------------------+
==> | "Cristiano Ronaldo" | "Real Madrid" | 12 | 10 | 69 | 0.14492753623188406 |
==> | "Lionel Messi" | "Barcelona" | 12 | 10 | 51 | 0.19607843137254902 |
==> | "Robert Lewandowski" | "FC Bayern München" | 12 | 6 | 43 | 0.13953488372093023 |
==> | "Carlos Tévez" | "Juventus" | 12 | 7 | 34 | 0.20588235294117646 |
==> | "Gareth Bale" | "Real Madrid" | 10 | 2 | 32 | 0.0625 |
==> | "Luis Suárez" | "Barcelona" | 9 | 6 | 30 | 0.2 |
==> | "Neymar" | "Barcelona" | 11 | 9 | 29 | 0.3103448275862069 |
==> | "Hakan Calhanoglu" | "Bayer 04 Leverkusen" | 8 | 2 | 29 | 0.06896551724137931 |
==> | "Edinson Cavani" | "Paris Saint Germain" | 8 | 6 | 27 | 0.2222222222222222 |
==> | "Alexis Sánchez" | "Arsenal" | 9 | 4 | 25 | 0.16 |
==> +-------------------------------------------------------------------------------------------+
也许您能想到一些更酷的产品? 我很想看到他们。 从github获取代码并尝试一下。
翻译自: https://www.javacodegeeks.com/2015/06/neo4j-the-bbc-champions-league-graph.html
neo4j :rel