|
Given names data
This data table, built by GRC
Data Intelligence, has been created by processing real world data files and counting the occurrences of each given name. The file has been cleaned, but as it has been created from real world data it may contain real world errors. To reduce errors, only those names found five or more times in real world data files are released. For the
The data contains:
The table attempts to capture what is found in the real world and to lend itself to processing according to most uses. Each record is unique by country, name as written (including casing) and gender. You can therefore expect to find each name multiple times within the file in this way:
When viewing the coverage figures (here), you would therefore not expect to have more than 50% of names gendered in any country for a single-gender name.
Strings which are known not to be given names, or which are or include family names that cannot be given names, are excluded from the release file.
We advise extreme caution is attempting any genderisation process on the basis of given names, but as this data is often used for this process, we have released a second table, containing gender information distilled from the main file. It shows the number of occurrences found for a given name per gender per country. For example:
The gender file contains 38071 records for this release. For full information, please refer to the documentation.
If you have any questions about this file, please contact us.
Sample
View a sample of 200 records from the file here.
Coverage
View the coverage of this version here. The numbers released are given in the "Found >5 times" column.
Full file documentation is available here.
Formats
Data is held in Microsoft Visual FoxPro format, but can be provided also in these formats: FoxPro 2.x (dBase III+), comma delimited text, tab delimited text, fixed column width text, and Excel (for small files (<64 000 records) only). Small data sets can be e-mailed, larger sets are provided on CD-ROM.
Prices
These files is available at the price (for both filles) of only EUR 950. If have any questions regarding the file, please contact us.
This data is offered on a royalty-free basis for use in any way you wish, with this important proviso: The data may be used for whatever purpose and is royalty free, but it may not be copied or distributed in any way whatsoever when it can, in normal use, be accessed by other users. In other words, if you would like to use this data in your software package, that is allowed provided users cannot get at, or export, the data themselves.
You will be asked to agree to our terms and conditions when purchasing. Our terms, conditions and licensing structure can be view here.
To order
To purchase the full file follow this link to order by credit card .
Customers
Many of our customers prefer to remain nameless for competitive reasons, and we respect this. Our customers include:
Coconut Island Software, Inc., Kea'au, USA
If you have any questions, please contact us
GRC Data Intelligence AMSTERDAM The Netherlands
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||