Note: At the time of writing Egencia's robots.txt (which says what a an automated process can access permits this use case)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
# robots.txt for https://www.egencia.com/daily/resource/
User-agent: *
Disallow: https://www.egencia.com/pub/
# Don't access any live site files
Disallow: https://www.egencia.com/daily/resource/documents/
# Don't spider documents
Disallow: https://www.egencia.com/daily/resource/council/
# TM Council site
Disallow: https://www.egencia.com/daily/resource/solution/
# TM Help Desk Disallow:
https://www.egencia.com/daily/resource/training/
# Training calendar
Disallow: https://www.egencia.com/daily/resource/images/
# all tmrc images
Disallow: https://www.egencia.com/daily/egts/*
# EGTS jump page
|
The Challenge
I currently work for one of the top 4 IT consultancies as a Data Scientist and as a northerner that means lots of travelling to London... On the plus side it does mean I get to collect my fair share of hotel points, which brings me onto the reason for this blog.
I was sat in my hotel room one cold, dark and stormy night (just kidding I was down south) and I wondered if I could get notified when double points offers were available for my travel plans. This would massively boost my points earning capability! I thought this is going to be a piece of cake! So, I got on my laptop and went straight into the Python terminal to start tinkering.
typing import requests
, r = requests.get('www.the-website.com')
... stops typing
Alas, I was thwarted by a login screen on the corporate travel booking website Egencia (the corporate version of Expedia) not only that but it required a session token and a myriad of other security features. The weeks went by with a few mentions of my project to my colleagues on how to get into this garrison of hotel offers. Then I read a great article on Medium about the Selenium web driver, using which you can mimic a human by programming the series of events that you would need to click or type to be able to access the information you require. In this article William Koehrsen shows how he used Selenium to upload his homework programmatically using python's selenium package.I finally had the key to the holy grail of points! Initially getting Selenium up and running wasn't the easiest of tasks, mainly because I wanted it to be in its own Docker container and my notebook to be running in another container. In the end I gave up and used a python virtual environment on my laptop and a pre-cooked Selenium Docker container.Now I could get to work, I started to inspect Egencia's web page, found the fields needed for the login and I was in! I felt like James Bond, now all I had to do was navigate my way through a minefield of pop ups, hidden DOM objects, 'load more' buttons, etc... sigh, this solution worked but it had many problems, speed being a major one. However, I've still put the code up on GitHub for reference link.
I was not happy with this solution and I really didn't feel like a good developer as I was relying on the View from the MVC framework which will inevitably change a lot more frequently than the Model. So I put my developer hat on with one of my good friends Sia Gholami and went back to requests but this time created a session, the first thing I needed was to get all the different tokens from the page so I used requests.get()
inside the session, now since the session is still active if I now use requests.post()
I will have the same session tokens that I retrieved from the get request.
Authenticating with requests
Check out the repo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
import requests
from bs4 import BeautifulSoup
import json
class Egencia(object):
def __init__(self, user, password):
self.user = user
self.password = password
self.session = requests.session()
self.__login()
def __login(self):
login_screen = self.session.get("https://www.egencia.co.uk/app?service=external&page=Login&mode=form&market=GB&lang=en")
# Find the POST form
b = BeautifulSoup(login_screen.text, "lxml")
action = b.find_all("form")[0]["action"].replace("./accessToken", "")
# Make a payload with all the default values for the form
data = {}
for input_node in b.find_all("form")[0].find_all("input"):
data[input_node["name"]] = input_node["value"]
# Override the username and password fields (default is "")
data["userName"] = self.user
data["password"] = self.password
# Use this payload to login
new_url = "https://www.egencia.com/auth/v1/accessToken" + action
r = self.session.post(new_url, data=data)
|
What now?
Now we just need to add a method for querying Egencia for a certain hotel brand in a specific city and for some set period of time, this is now just a case of creating a url string (I found this url by using the Chrome developer tools and inspecting the network traffic to see what happened when I tried booking my hotels manually)
Also note that there are some id's we have to find first, namely the user id and the company id. Again I found out where I could get these from by following the network traffic using Chrome developer tools.
Check out the repo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
def get_all_rooms(self, lon, lat, check_in, check_out, brand):
user_id = json.loads(self.session.get("https://www.egencia.co.uk/user-service/v2/users/"+self.user).text)["additional_information"]["user_id"]
company_id = json.loads(self.session.get("https://www.egencia.co.uk/user-service/v2/users/{}?include=roles&include=info".format(user_id)).text)["company_ids"][0]
# Create the request URL
hotels_url="".join(["https://www.egencia.co.uk/hotel-search-service/v1/hotels?",
"radius_unit=km&adults_per_room=1&search_type=ADDRESS&start=0&source=HOTEL_WEBAPP&source_version=1.0&search_page=SEARCH_RESULTS&rate_amenity=&hotel_amenity=&want_in_policy_rates_only=false&want_central_bill_eligible_rates_only=false&want_prepaid_rates_only=false&want_free_cancellation_rates_only=false&chain_id=&neighborhood_filter_id=&minimum_stars=0&minimum_price=&maximum_price=&apply_prefilters=true",
"&count=9999",
"&longitude=", str(lon),
"&latitude=", str(lat),
"&check_in_date=", check_in,
"&check_out_date=", check_out,
"&hotel_name=", brand,
"&main_traveler=", str(user_id),
"&traveler=", str(user_id),
"&company_id=", str(company_id)])
search = self.session.get(hotels_url)
resp = json.loads(search.text)
return resp
|
The final solution
Now I wanted to wrap this up nice and neatly, as I mentioned at the beginning I wanted to be notified and for this to be ran automatically so I looked to two other technologies, one more recent than the other. The age old Cron job for running jobs at a set time and Telegram which is a great free messaging application and you can make bots!
I whacked the class I made above into a separate file called egencia_obj.py
then created a few helper functions. I also made a CSV data.csv
using excel showing the following:
Date |
City |
Max cost |
02/04/2018 |
Croydon |
85 |
03/04/2018 |
Croydon |
85 |
04/04/2018 |
Croydon |
85 |
05/04/2018 |
Croydon |
85 |
06/04/2018 |
Croydon |
85 |
07/04/2018 |
|
|
08/04/2018 |
|
|
09/04/2018 |
Croydon |
85 |
10/04/2018 |
Croydon |
85 |
11/04/2018 |
Croydon |
85 |
12/04/2018 |
Croydon |
85 |
13/04/2018 |
Croydon |
85 |
14/04/2018 |
|
|
15/04/2018 |
|
|
Now when the script below runs it will message me with any Hilton offers it has gotten from Egencia's website! Note I put all of my credentials into a separate file settings.py
Check out the repo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
|
import itertools
from datetime import datetime, timedelta
import requests
from geopy.geocoders import Nominatim
import telegram
import pandas as pd
import settings
from egencia_obj import Egencia
def find_double_points_offers(lon, lat, check_in, check_out, brand, max_amount, max_distance):
"""Function to parse their response payload to get what we actually want"""
out = e.get_all_rooms(lon, lat, check_in, check_out, brand)
offers = []
for hotel in out["hotels"]:
if hotel.get("rooms") and float(hotel["location"]["distance_from_search"]["value"]) < max_distance:
for room in hotel["rooms"]:
if '2 X Points' in room["description"] and room["price"]["user_currency"]["amount"] <= max_amount:
offers.append({
"hotel_name": hotel["hotel_name"],
"description" : room["description"],
"rate" : room["price"]["user_currency"]["amount"],
"check in": check_in,
"check out": check_out
})
return offers
def get_check_in_out_dates(path):
"""Helper function to convert the csv into checkin dates for each of the citys"""
data = pd.read_csv(path)
data['Date'] = pd.to_datetime(data['Date'], format="%d/%m/%Y")
data.sort_values(by="Date", inplace=True)
data = data.where(data.notnull(), None)
data.dropna(axis=0, how='all', inplace=True)
out=[]
tmp = {}
for i in data.index[:-1]:
if not tmp.get("check in") and data.iloc[i]["City"] is not None:
if data.iloc[i]["City"] is None:
print(data.iloc[i]["City"])
tmp["check in"] = data.iloc[i]["Date"].strftime('%Y-%m-%d')
if (data.iloc[i]["City"] != data.iloc[i+1]["City"] or i == max(data.index)-1) and tmp.get("check in"):
tmp["check out"] = (data.iloc[i]["Date"] + timedelta(days=1)).strftime('%Y-%m-%d')
tmp["city"] = data.iloc[i]["City"]
tmp["max amount"] = data.iloc[i]["Max cost"]
out.append(tmp); tmp = {}
return out
def make_pretty_message(offers):
"""Helper function to produce markdown to send to Telegram"""
md = ""
for offer in offers:
md += "*{}*\n".format(offer['hotel_name'])
md += "_{} - {}\n".format(datetime.strptime(offer["check in"], "%Y-%m-%d").strftime('%d/%m/%Y'),
datetime.strptime(offer["check out"], "%Y-%m-%d").strftime('%d/%m/%Y'))
md += "{}\n".format(offer["description"])
md += "£{0:.2f}_\n".format(offer["rate"])
md += "\n"
return md
def remove_duplicate_results(results):
"""Helper function to remove any duplicates incase they happen"""
return [dict(y) for y in set(tuple(x.items()) for x in results)]
def get_date_perms(check_in, check_out):
"""Helper function to return all permutations of dates ordered by length"""
delta = datetime.strptime(check_out, "%Y-%m-%d")-datetime.strptime(check_in, "%Y-%m-%d")
all_dates = [datetime.strptime(check_out, "%Y-%m-%d") + timedelta(days=i) for i in range(delta.days + 1)]
l = list(itertools.permutations(all_dates, 2))
date_perms = [(x[0].strftime('%Y-%m-%d'), x[1].strftime('%Y-%m-%d')) for x in l if x[0] < x[1]]
return sorted(date_perms, key=lambda tup: datetime.strptime(tup[1], "%Y-%m-%d")-datetime.strptime(tup[0], "%Y-%m-%d"), reverse=True)
def sort_results(results):
"""Helper function to sort the message in order of checkin date"""
return sorted(results, key=lambda d: (datetime.strptime(d["check in"], "%Y-%m-%d"),
datetime.strptime(d["check out"], "%Y-%m-%d")-datetime.strptime(d["check in"], "%Y-%m-%d")),
reverse=False)
if __name__ == "__main__":
bot = telegram.Bot(token=settings.API_TOKEN)
chat_id = settings.CHAT_ID
geolocator = Nominatim()
brand = "hilton"
max_distance = 3
results = []
my_searches = get_check_in_out_dates("data.csv")
e = Egencia(settings.CREDENTIALS["email"], settings.CREDENTIALS["password"])
for my_search in my_searches:
max_amount = my_search["max amount"]
location = geolocator.geocode(my_search["city"])
all_dates = get_date_perms(my_search["check in"], my_search["check out"])
for dates in all_dates:
check_in = dates[0]
check_out = dates[1]
print("Searching Egencia for: [brand: {}, max_amount: {}, location: {}, check_in: {}, check_out: {}]".format(brand,max_amount, location, check_in, check_out))
offers = find_double_points_offers(location.raw['lon'], location.raw['lat'], check_in, check_out, brand, max_amount, max_distance)
if len(offers) > 0:
print("I found some offers!\033[1;36m")
print(offers);print("\033[0;0m")
results += offers
if len(results) > 0:
dd_results = remove_duplicate_results(results)
sorted_results = sort_results(dd_results)
output = make_pretty_message(sorted_results)
bot.send_message(chat_id, output, parse_mode=telegram.ParseMode.MARKDOWN)
|
After this runs I get sent a message with all of the offers it finds! If I see anything I like I can then log in to Egencia and book it, I haven't gotten it quite so far as having a 'book it' link but that could be my next stage for this project.
Heres a few request challenges for you
- Automatically 'log in' to Netflix or Amazon Video and find your most recently watched film/series and make your desktop background the artwork image.
- Do the same as I have but with Hilton, Marriot, Holiday Inn, whatever floats your boat.
- If you're feeling risky and I don't advise this challenge, you could log into your online banking (depending on how their system works) then you could pull your statements and train a classifier to say whether a payment was on expenses or was a personal payment. This is something I'm waiting for with the open banking initiative, so the banks provide an API which would be more secure than storing plaintext passwords on your machine!