Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Belivins committed Aug 23, 2023
0 parents commit ac302e6
Show file tree
Hide file tree
Showing 11 changed files with 368,634 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/Infotecks.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

368,272 changes: 368,272 additions & 0 deletions RU.txt

Large diffs are not rendered by default.

Binary file added Razrabotchik-Python-Ufa.docx
Binary file not shown.
141 changes: 141 additions & 0 deletions readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@

Readme for GeoNames Gazetteer extract files

============================================================================================================

This work is licensed under a Creative Commons Attribution 4.0 License,
see https://creativecommons.org/licenses/by/4.0/
The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.

The data format is tab-delimited text in utf8 encoding.


Files :
-------
XX.zip : features for country with iso code XX, see 'geoname' table for columns. 'no-country' for features not belonging to a country.
allCountries.zip : all countries combined in one file, see 'geoname' table for columns
cities500.zip : all cities with a population > 500 or seats of adm div down to PPLA4 (ca 185.000), see 'geoname' table for columns
cities1000.zip : all cities with a population > 1000 or seats of adm div down to PPLA3 (ca 130.000), see 'geoname' table for columns
cities5000.zip : all cities with a population > 5000 or PPLA (ca 50.000), see 'geoname' table for columns
cities15000.zip : all cities with a population > 15000 or capitals (ca 25.000), see 'geoname' table for columns
alternateNamesV2.zip : alternate names with language codes and geonameId, file with iso language codes, with new columns from and to
alternateNames.zip : obsolete use V2, this file does not have the new columns to and from and will be removed in the future
admin1CodesASCII.txt : names in English for admin divisions. Columns: code, name, name ascii, geonameid
admin2Codes.txt : names for administrative subdivision 'admin2 code' (UTF8), Format : concatenated codes <tab>name <tab> asciiname <tab> geonameId
iso-languagecodes.txt : iso 639 language codes, as used for alternate names in file alternateNames.zip
featureCodes.txt : name and description for feature classes and feature codes
timeZones.txt : countryCode, timezoneId, gmt offset on 1st of January, dst offset to gmt on 1st of July (of the current year), rawOffset without DST
countryInfo.txt : country information : iso codes, fips codes, languages, capital ,...
see the geonames webservices for additional country information,
bounding box : http://api.geonames.org/countryInfo?
country names in different languages : http:/api.geonames.org/countryInfoCSV?lang=it
modifications-<date>.txt : all records modified on the previous day, the date is in yyyy-MM-dd format. You can use this file to daily synchronize your own geonames database.
deletes-<date>.txt : all records deleted on the previous day, format : geonameId <tab> name <tab> comment.

alternateNamesModifications-<date>.txt : all alternate names modified on the previous day,
alternateNamesDeletes-<date>.txt : all alternate names deleted on the previous day, format : alternateNameId <tab> geonameId <tab> name <tab> comment.
userTags.zip : user tags , format : geonameId <tab> tag.
hierarchy.zip : parentId, childId, type. The type 'ADM' stands for the admin hierarchy modeled by the admin1-4 codes. The other entries are entered with the user interface. The relation toponym-adm hierarchy is not included in the file, it can instead be built from the admincodes of the toponym.
adminCode5.zip : the new adm5 column is not yet exported in the other files (in order to not break import scripts). Instead it is availabe as separate file.
columns: geonameId,adm5code

The main 'geoname' table has the following fields :
---------------------------------------------------
geonameid : integer id of record in geonames database
name : name of geographical point (utf8) varchar(200)
asciiname : name of geographical point in plain ascii characters, varchar(200)
alternatenames : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
latitude : latitude in decimal degrees (wgs84)
longitude : longitude in decimal degrees (wgs84)
feature class : see http://www.geonames.org/export/codes.html, char(1)
feature code : see http://www.geonames.org/export/codes.html, varchar(10)
country code : ISO-3166 2-letter country code, 2 characters
cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
admin3 code : code for third level administrative division, varchar(20)
admin4 code : code for fourth level administrative division, varchar(20)
population : bigint (8 byte int)
elevation : in meters, integer
dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone : the iana timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format


AdminCodes:
Most adm1 are FIPS codes. ISO codes are used for US, CH, BE and ME. UK and Greece are using an additional level between country and fips code. The code '00' stands for general features where no specific adm1 code is defined.
The corresponding admin feature is found with the same countrycode and adminX codes and the respective feature code ADMx.



The table 'alternate names' :
-----------------------------
alternateNameId : the id of this alternate name, int
geonameid : geonameId referring to id in table 'geoname', int
isolanguage : iso 639 language code 2- or 3-characters; 4-characters 'post' for postal codes and 'iata','icao' and faac for airport codes, fr_1793 for French Revolution names, abbr for abbreviation, link to a website (mostly to wikipedia), wkdt for the wikidataid, varchar(7)
alternate name : alternate name or name variant, varchar(400)
isPreferredName : '1', if this alternate name is an official/preferred name
isShortName : '1', if this is a short name like 'California' for 'State of California'
isColloquial : '1', if this alternate name is a colloquial or slang term. Example: 'Big Apple' for 'New York'.
isHistoric : '1', if this alternate name is historic and was used in the past. Example 'Bombay' for 'Mumbai'.
from : from period when the name was used
to : to period when the name was used

Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table without links and postal codes but with ascii transliterations. You probably don't need both.
If you don't need to know the language of a name variant, the field 'alternatenames' will be sufficient. If you need to know the language
of a name variant, then you will need to load the table 'alternatenames' and you can drop the column in the geoname table.




Boundaries:
Simplified country boundaries are available in two slightly different formats:
shapes_simplified_low:
geonameId: The geonameId of the feature
geoJson: The boundary in geoJson format

shapes_simplified_low.json:
similar to the abovementioned file, but fully in geojson format. The geonameId is a feature property in the geojson string.


Statistics on the number of features per country and the feature class and code distributions : http://www.geonames.org/statistics/


Continent codes :
AF : Africa geonameId=6255146
AS : Asia geonameId=6255147
EU : Europe geonameId=6255148
NA : North America geonameId=6255149
OC : Oceania geonameId=6255151
SA : South America geonameId=6255150
AN : Antarctica geonameId=6255152


feature classes:
A: country, state, region,...
H: stream, lake, ...
L: parks,area, ...
P: city, village,...
R: road, railroad
S: spot, building, farm
T: mountain,hill,rock,...
U: undersea
V: forest,heath,...


If you find errors or miss important places, please do use the wiki-style edit interface on our website
https://www.geonames.org to correct inaccuracies and to add new records.
Thanks in the name of the geonames community for your valuable contribution.

Data Sources:
https://www.geonames.org/data-sources.html


More Information is also available in the geonames faq :

https://forum.geonames.org/gforum/forums/show/6.page

The forum : https://forum.geonames.org

or the google group : https://groups.google.com/group/geonames

5 changes: 5 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
googletrans==3.1.0a0
transliterate==1.10.2
Flask==2.2.2
pandas~=1.3.4
pytz~=2021.3
178 changes: 178 additions & 0 deletions script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
import pandas as pd
from googletrans import Translator
from difflib import SequenceMatcher
from transliterate import translit
from datetime import datetime
from pytz import timezone
from flask import Flask, request, jsonify


class GeoName:

def __init__(self):
""" Чтение данных из файла.
Фильтрация населенных пунктов из всех данных.
Заполнение NaN значений пустой строкой.
"""
headers = [
'geonameid',
'name',
'asciiname',
'alternatenames',
'latitude',
'longitude',
'feature class',
'feature code',
'country code',
'cc2',
'admin1',
'admin2',
'admin3',
'admin4',
'population',
'elevation',
'dem',
'timezone',
'date'
]
data = pd.read_csv("RU.txt", sep="\t", names=headers, low_memory=False)
self.data_frame = data[data['feature class'] == 'P'].reset_index(drop=True)
self.data_frame = self.data_frame.fillna('')

def get_by_id(self, name_id: int):
""" Получение информации о населенном пункте по geonameid. """
return self.data_frame[self.data_frame['geonameid'] == name_id]

def get_page(self, page_num: int, display_num: int):
""" Получение страницы с информацией о городах.
:param page_num: номер страницы.
:param display_num: количество городов на странице.
:return: Возвращает нужную страницу с заданным количеством городов.
"""
start_index = (page_num - 1) * display_num
end_index = start_index + display_num
num_row = self.data_frame.shape[0]
num_pages = (num_row + display_num - 1) // display_num
if (start_index >= 0) and (start_index < num_row) and (page_num < num_pages):
return self.data_frame[start_index:end_index]
else:
return pd.DataFrame()

def search_in_string(self, city_name):
""" Поиск названия в строке.
:param city_name: Название города.
:return: Возвращает список городов, которые имеют схожие альтернативные названия.
"""
return self.data_frame[pd.Series(
max([True if elm.lower() == city_name.lower() else False for elm in row]) for row in
self.data_frame['alternatenames'].str.split(','))]

def find_city(self, city_name):
""" Поиск городов с заданным названием.
:param city_name: Название города
:return: Список городов, отсортированных в порядке убывания по населению
"""
found_cities = pd.concat([self.data_frame[self.data_frame['name'].str.fullmatch(city_name, case=False)],
self.search_in_string(city_name)]).drop_duplicates()
return found_cities.sort_values('population', ascending=False).reset_index(drop=True)

def translate_name(self, name):
""" Перевод с русского на английский.
:param name: Название города на русском.
:return: Название на английском.
Сравниваем два перевод и транслит названия.
Если они схожи, то выбираем перевод, иначе транслит.
Например проблемы перевода названий Zyabrikovo и Ladnoye
"""
translator = Translator()
name_1 = translator.translate(name).text.title()
name_2 = translit(name, reversed=True)
if SequenceMatcher(None, name_1, name_2).ratio() >= 0.8:
return name_1
else:
return name_2

def find_matches(self, part_name):
""" Поиск похожих названий.
:param part_name: Название или часть названия города.
:return: Список городов, имеющих похожее название.
"""
found_cities = pd.concat([self.data_frame[self.data_frame['name'].str.find(part_name) != -1], self.data_frame[
self.data_frame['alternatenames'].str.find(part_name) != -1]]).drop_duplicates()
return found_cities.sort_values('population', ascending=False).reset_index(drop=True)


app = Flask(__name__)
app.config['JSON_SORT_KEYS'] = False
app.config['JSON_AS_ASCII'] = False
geo_date = GeoName()


@app.route('/get_info/<int:name_id>', methods=['GET'])
def get_info(name_id):
""" Получение информации о населенном пункте по geonameid. """
city_info = geo_date.get_by_id(name_id)
if city_info.empty:
return "No matches found", 400
return jsonify(city_info.to_dict(orient="records"))


@app.route('/get_page', methods=['GET'])
def get_page():
""" Получение страницы с информацией о городах.
:param page_num: номер страницы.
:param display_num: количество городов на странице.
:return: Возвращает нужную страницу с заданным количеством городов.
"""
page_num = request.args.get('page', type=int)
display_num = request.args.get('num', type=int)
page = geo_date.get_page(page_num, display_num)
if page.empty:
return "Invalid page", 400
return jsonify(page.to_dict(orient="records"))


@app.route('/get_compare', methods=['GET'])
def get_compare():
""" Сравнение двух городов.
:param first_city: Название первого города.
:param second_city: Название второго города.
:return: Возвращает информацию о городах, какой из них расположен севернее,
одинаковые ли временные зоны и на сколько часов отличаются.
"""
first_city = request.args.get('first')
second_city = request.args.get('second')
if (first_city is None or first_city == "") or (second_city is None or second_city == ""):
return "Empty request", 400
first_city = geo_date.find_city(geo_date.translate_name(first_city))
second_city = geo_date.find_city(geo_date.translate_name(second_city))
if first_city.empty or second_city.empty:
return "No matches found", 400
first_city = first_city.loc[first_city.index[0]]
second_city = second_city.loc[second_city.index[0]]
northeren = first_city['name'] if (first_city['latitude'] > second_city['latitude']) else second_city['name']
timezone_equal = first_city['timezone'] == second_city['timezone']
time_difference = abs(datetime.now(timezone(first_city['timezone'])).time().hour - datetime.now(
timezone(second_city['timezone'])).time().hour)
return jsonify(first_city.to_dict(), second_city.to_dict(), northeren, timezone_equal, time_difference)


@app.route('/get_matches', methods=['GET'])
def get_matches():
""" Поиск похожих названий.
:param name: Название или часть названия города.
:return: Список городов, имеющих похожее название.
"""
city_name = request.args.get('name')
if city_name is None or city_name == "":
return "Empty request", 400
name_list = geo_date.find_matches(city_name)['name'].to_list()
if not name_list:
return "No matches found", 204
return jsonify(name_list) # .to_json(orient='records')


# driver function
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8000, debug=True)

0 comments on commit ac302e6

Please sign in to comment.