Child pages
  • Central Country/State/City database with Multilingual support
Skip to end of metadata
Go to start of metadata

Imported From: http://groups.google.com/group/in-portal-dev/browse_thread/thread/abd9374373f3a34e#

Have have came across major improvement and at the same time feature that will benefit all users.

We need to refactor how we currently store and use Countries and States so it's:

  1. All data (countries/states/cities) and their translations are stored in it's own table - independent from current Phrases table (quite big) in order to escape table JOINs.
  2. Add functionality to include States/Provinces and Cities for all Countries world wide - not only US and Canada as we have now.
  3. New Sections to Manage all of it (Countries/States/Cities), including relations and translations.
  4. Connect all 3 fields (country, state, city) to ALL forms that are using these now + add auto-complete drop-down for the City field for selected Country/State combination.
  5. Create the ability to import States and Cities data for selected Countries right in the In-Portal Admin via web-service running on In-Portal.com. This is quite critical since initially we'll be shipping only US and Canada States and Cities. While complete database is about 3 million records.

Let me know if I am missing anything here! 

PS.

By the way, we are already in the process of working on In-Portal 5.1.0 which will include lots of good stuff.

You can checkout the list blow. While it's not final, I think it's most of it, though http://tracker.in-portal.org/search.php?project_id=1&sticky_issues=on&target_version=5.1.0&sortby=last_updated&dir=DESC&hide_status_id=90.

25 Comments

  1. With multilanguage tables there is such problem that at addition/
    removal of languages from PHP is started ALTER TABLE SQL to add/remove
    corresponding columns. If all cities will be in one table such SQL
    will be executed very long (about10 minutes). For 10 minutes in the
    majority of systems the server will stop PHP-script by timeout, and
    this is inadmissible.

    I offer:

    1) Store cities not in one table, but in several. For example, on one
    for everyone state and for everyone country (in case there are no
    states at country). That is, in about 4500 tables with columns

    CityId
    Name - multilanguage field, required

    And with table names:
    1.1) cities_US_IL - cities of state Illinois
    1.2) cities_SG - cities of country Singapore (in Singapore there are
    no states)

    2) Make non-standard unit "cities", adhered not to one table, but to
    group of tables, and in a concrete case, depending on the parameters
    (country, state), choosing one table from group and working with it in
    the standard way.
    3) Make changes in core unit "languages" and in multilanguage_helper,
    3.1) that at addition/removal languages changes in multilingual tables
    structure became by repeated start of a PHP-script with use
    AjaxProgressBar (for example to process on 10 tables for one loop)
    3.2) that the non-standard cities unit was considered, and all its
    tables instead of one have been changed
    4) Store countries in the separate table with structure

    CountryId
    Name - multilanguage field, required
    Code - a 2-symbolical code, unique, required
    Code3 - a 3-symbolical code

    And create unit "countries" for manipulations with countries. At new
    соuntry creation it is necessary to create the table for cities
    automatically. At change of field Code - rename tables with cities of
    given country.

    5) Store states in the separate table with structure

    StateId
    Name - multilanguage field, required
    Country - a 2-symbolical code, required, related to Country->Code
    Code - a 2-symbolical code, required

    Each Country + Code combination must be unique in this table.

    And create unit "states" for manipulations with states. At new state
    creation it is necessary to create automatically the table for cities.
    At change of field Code - rename the table with cities of given state.

  2. Hi Erik,

    Thanks for your input and great ideas on implementation.

    I personally think this is the right direction, but still needs to
    fine tuning before we finalize and file this as a feature request.

    Some of my comments follows here:

    1. I believe it's a bit too MANY tables to have just for this
    functionality. What if we start with listing actual numbers of
    Countries, States and Cities we have so we evaluate and figure out the
    amount of data we are dealing with. While I am sure you already have
    this information it's still not full clear to us.

    2. What's your take (opinion) on the Import part of this data. Do you
    agree it's not really necessary to have all Ci

    3. I was giving a second thought of having City names as Multilingual
    field. Just imaging amount of text to be translated - I don't know if
    this will ever happen at all. Can you agree that English version
    probably won't be changed at all. Yes, I think it's possible for
    Countries and I think we can actually do a trick and have Country
    names by default entered in it's local translation (to be honest not
    sure if it's that important and will matter too).

    Please let me know your thoughts.

    Others (EVERYONE) please also share you mind here!

    DA.

  3. Yes, I'm also interested in numbers and how much exactly are cities inside
    each country and each state.

  4. Hello,

    I bring my tought to your discussion.

    Since now, I never had a customer who needed all countries for
    registration/shipping, and I've always deleted all countries except
    the 2-3 they needed.

    I'll let you define the best way to implement his in DB, my idea is to
    be able to import .csv files, this way we will benefit of :

    - adding/deleting countries easily without putting hands in DB

    - lightening the DB when only few countries is needed (sometimes only
    1 is requested for a national-wide business)

    - giving to community updated files on in-portal.com, as well as new
    country files (with more details)

    I wait your opinion !

    Phil.

  5. Lists of the countries and states with quantity of cities I send for
    Alex and Dmitry by e-mail.

    What does "too many tables" mean? As far as I know in MySQL there is
    no restriction on quantity of tables in a database.

    Yes, probably, it is possible to add in table "Countries" field
    "NeedCities". If NeedCities = true, then system will manage tables
    with cities of this Country. If NeedCities = false, then system will
    work without requesting those tables.

    I agree with Phil that CSV-import of cities is necessary. On the in-
    portal.com  site I suggest to make the catalog for downloading of
    corresponding CSV-files.

    Files I suggest to call using value "Locale" from "Configuration -
    Language Packs" section. For example, USA cities in Russian will be in
    a US-cities-ru-RU-locale.csv file. At import corresponding tables of
    cities (if they still are not present) will be created, and also check
    probably is necessary that the administrator could not import
    secondary language translations before Primary Language translation is
    imported. In CSV-files at such approach enough two fields

    1) CityId - To attribute Secondary Translations to the cities created
    during import Primary Translation.
    2) Translation - The city name in "locale" language.

  6. There is still such idea - not to include at all in In-Portal the
    table with cities, and instead to make common web service which gives
    out city suggestions in the necessary language, and in In-Portal to
    make only references to this service.
  7. We also have imagined such idea with Dmitry, but maybe we forgot to file it
    here.

  8. Hi guys,

    1. Let's describe that scenario with State and City here too so we can
    fully evaluate this approach.

    2. Erick if possible would you please list here the following:

    a. Total Number of Countries.
    b. Total Number of States/Provinces (sum all Counties).

    - Also are there any Countries without State/Province (how many if you
    know?).
    - Also are there any State/Province without Cities (how many if you
    know?).

    Example in the list you sent:
    AO      Luanda 0
    BD      Khulna 0

    c. Total Number of Cities (sum all Countries).

    Thanks!

    DA.

  9. Total Number of Countries = 242
    Total Number of States/Provinces = 4258

    There are countries without States/Provinces = 49
    There are States/Provinces without Cities = 717

    The scenario I imagine - we will make web service which gives out city
    suggestions and we will set it's URL as default value in In-Portal
    configuration. If our web service seems to someone not absolutely
    convenient - he can download our web service's source code, put it on
    the his own server, modernise (for example, to add translations of
    cities into languages which are not present in our web service).

  10. So,

    1) in In-Portal project

    1.1) Store countries in the separate table with structure

    CountryId
    Name - multilanguage field, required
    Code - a 2-symbolical code, unique, required
    Code3 - a 3-symbolical code
    IsInUse - tityint(1) - switch on or off countries. If country is off,
    then it is not shown in front-end dropdowns.

    And create unit "countries" for manipulations with countries.

    1.2) Store states in the separate table with structure

    StateId
    Name - multilanguage field, required
    Country - a 2-symbolical code, required, related to Country->Code
    Code - a 2-symbolical code, required

    Each Country + Code combination must be unique in this table.

    And create unit "states" for manipulations with states

    1.3) In "Countries" and "States" sections make button "Import
    Translation", supposing csv file upload.
    CSV file name must have part, which identify translation language.
    For example Russian language:
    1.3.1) states - states-ru-RU-locale.csv
    1.3.2) countries - countries-ru-RU-locale.csv

    1.4) Add configuration variable "Cities Suggestion WEB service URL",
    by default pointing on In-Portal site's web service. If this variable
    is not empty, then on city field input suggestions will be loaded by
    ajax.

    1.5) Change all existing address input forms to use new countries/
    states data structure

    2) Make In-Portal Customization "City Suggestions WEB Service"

    2.1) Store cities not in one table, but in several. For example, on
    one
    for everyone state and for everyone country (in case there are no
    states at country). That is, in about 4500 tables with columns

    CityId
    Name - multilanguage field, required

    And with table names:
    2.1.1) cities_US_IL - cities of state Illinois
    2.1.2) cities_SG - cities of country Singapore (in Singapore there
    are
    no states)

    2.2) Make non-standard unit "cities", adhered not to one table, but
    to
    group of tables, and in a concrete case, depending on the parameters
    (country, state), choosing one table from group and working with it
    in
    the standard way.

    2.3) Make City Suggestions interface. Sample request to get cities of
    Illinois in Russian, started with 'ch':
    http://project.path/get_cites.html?locale=-ru-RU&country=US&state=IL&city=ch
    Totally, there may be 4 parameters in such request:
    2.3.1) locale - define language, required
    2.3.2) country - define country, required
    2.3.3) state - define state, required for countries, which have
    states, not used for contries which have no states
    2.3.4) city - symbols, with which city name starts, required/

  11. Prefix names must be "country" instead of "countries", "state" instead of
    "states" and so on. Why link state to country by county iso code, when we
    already have it's ID. I propose we link by IDs when possible. Table naming
    seems ok, for country without states it would be "Cities_LVA_ANY" and with
    states "Cities_USA_IL". Also I propose to use 3-symbol ISO code instead of
    2-symbol ISO code in table names.

    About city suggestion:
    Of course we shouldn't pass locale to url, because we already know it from
    current site language. Also this all should be and event, like
    OnSuggestCities that will return data in JSON format. Here also we should
    use 3-symbol ISO code, as we use it in all in-portal modules already.

    Field "IsInUse" of course should be named "Enabled" or "Status" as usual in
    In-Portal.

    1.4) We won't be loading city suggestions from other site, so maybe you
    thought about url to http://www.in-portal.com site to script, which will
    return that CSV file?

  12. Sure, we may use 3-symbol codes instead of 2-symbol codes. In this
    case additional time will be required for data convertation, because
    in existing "cities" table 2-symbol codes are used now. Also, 2-symbol
    codes are preferable, because they are used in site domains, so, if I
    want to get all "russians" from PortalUser table, I need know only one
    code - "RU", to write SQL

    SELECT Email
            FROM inp_PortalUser WHERE Country='RU'
            OR      RIGHT(Email,3) = '.ru'

    In other case, there additional time required to find additional 3-
    symbol ISO code.

    Why relation between states and countries by country code is
    preferable. This is so because in address fields are stored country
    codes, not country IDs. For example, in PortalUser record we can see
    "USA", not 225, and then, if we need get all states for this country,
    relation between states AND countries by ID means excessive join in
    SQL request. Also, it is more convenient to work directly with DB data
    (for example, in PMA) when fields contain reasonable values, not more
    abstract codes.

    1.4) yes, by default there must be URL to www.in-portal.com site, but,
    in some In-Portal customization may be special conditions and then
    www.in-portal.com site's service may be replaced to fit them.
    Developer of this customization may create another cities suggestion
    service, install it on other domain and join it by change this one
    paremeter in In-Portal configuration.

  13. All except 1st paragraph ok. User countries are stored using 3-symbol ISO
    code on all projects, where user can specify country, no cities database
    should be converted to use 3-symbol country code.

  14. Hey mates,

    Here is a quick list of Variables and basic process flow we are looking to
    have with this functionality:

    1. Country/State/City Suggestion service.

    To enable this we need a variable - I suggest to have variable named =
    EnableAddressSuggestions ( translation - "Enable Address Suggestions (ie.
    Registration form)" )

    System will check whether to use current (OLD fashion) style or NEW on User
    Registration when process goes as user selects:

    - Country
    - State/Province
    - City (suggestions pulled from In-Portal web service)
    - Zip
    - Street and so on

  15. I guess, it is impossible to use SOAP for suggestuions, because those
    will be taken by javascript/ajax, not from PHP code. Team Leader Alex
    also recomends suggestions output as JSON array and this is
    incompatible with SOAP, as I understand.
  16. Not completely right. We could ask php code on our site (http://www.site.com)
    to get SOAP suggestions from http://www.in-portal.com, then cache them
    locally in database and then return result as JSON to form, who requested
    that on our site. It's like our site acts as proxy for that all stuff.

    On Thu, Feb 25, 2010 at 6:50 PM, Erik Snarski <s...@jamajo.lv> wrote:
    > I guess, it is impossible to use SOAP for suggestuions, because those
    > will be taken by javascript/ajax, not from PHP code. Team Leader Alex
    > also recomends suggestions output as JSON array and this is
    > incompatible with SOAP, as I understand.

  17. And that proxy-solution with excessive cache development means "SPEED
    UP the process" which goal is marked by Team Leader Dmitry?
  18. No excessive cache. I just think, that city auto-complete should no go to
    http://www.in-portal.com for each form. With all that you've forgot original
    idea: administrator import only needed cities and auto-complete goes from
    out site, not in-portal.com at all.

    On Fri, Feb 26, 2010 at 6:43 AM, Erik Snarski <s...@jamajo.lv> wrote:
    > And that proxy-solution with excessive cache development means "SPEED
    > UP the process" which goal is marked by Team Leader Dmitry?

  19. I do not forget original idea, and I remember also that idea is
    unacceptable because of possible quantity and size of necessary tables.
  20. Record count in database doesn't matter here, but total table size on disk
    in MB matters, because hosting takes money for hard drive space used. For
    example how much does table with all cities of all us states weight?

    On Fri, Feb 26, 2010 at 10:46 AM, Erik Snarski <s...@jamajo.lv> wrote:
    > I do not forget original idea, and I remember also that idea is
    > unacceptable because of possible quantity and size of necessary tables.

  21. Hi Erik, Alex,

    Yes, I think it's a good idea to have ability to store results locally so we
    don't get overwhelmed by traffic.

    I suggest that we do both things - Store/Cache Results returned from
    In-Portal.com API and skip Cache results.

    What you think?

    DA.

  22. Table with all USA cities (without multilanguage fields) may have
    about - 15 MB size
    All cities table on resumark (without multilanguage fields) - size is
    275.2 MB

    My opinion is - cache is not required, because cities suggestions SQL
    use indexes and are fast.

  23. Another important point about lightening DB: even if you have
    unlimited space for your DB, when it comes to backup, we all prefer a
    tiny gz.

    Also, when you perform a request on a big table, few thousandth of a
    second multiplied by "n" requests would create response lag for all DB
    requests, just for a city name, which is a non-critical information
    that users all over the world have entered by themselves the past 20
    years.

    I hope you don't take it personnaly Eric, but I stick on the idea to
    import only needed countries, or to use suggestion service as
    described.

    Dmitry, the Google Address Mapping Service is an amazing start to
    enhance the shipping engine.
    For example, In-Commerce could be able to provide, in future release
    (or custom dev), the distance needed for each delivery, plus complete
    stats about distances, sorted for orders/deliveryman/total amount...
    With this kind of feature, In-Portal could be of interest for bigger
    projects, isn't it?

  24. Erik, Alex,

    We need to coordinate this task since I know BOTH of you are working on it's
    different parts.

    Alex, did you have a chance to explain to Erik your part on this task?

    Hi Phil, we'll come back to part with Google Map Services later in this
    discussion.

    DA.

  25. Closing due no immediate benefits from implementation.