Having a Dutch surname like "Spring in 't Veld" (roughly translates to "Jump in th' Field") causes all kinds of problems on form entries and editors that add smart quotes.
- Many non-sanitized SQL queries fail,
- buggy URL or HTML parsers create code like:
/spring-in/
<meta content='Spring in 't Veld ...
- Pagetitles like: Spring in /'t Veld
- Added smart quotes "Spring in ’t Veld" from Word or Rich Text Editors cause problems with sorting and identity consolidation.
- Stripping the quote character trips a 2-letter requirement.
- etc.
It is unlikely that people will name their son
Robert';) DROP TABLE Students;--
But you should at least prepare your db queries for a quote. :)
Your roughly translated HN name is also funny: Blue buttock gargle ;)
But yes. Programmers have a hard time using unicode. It's a shame not all programming languages use unicode.
I also think is has to do with bad research. It isn't hard to check different types of family names and naming order.
There are a lot of countries where they place the family name first followed by there personal name.
I'm still trying to figure the best way to store names. Maybe 2 fields will fit all:
family_name (Spring in 't Veld)
name (Robert Spring in 't Veld)
Family name can be used for grouping and sorting. But you need to store the name as the user entered it. Don't try to split it in parts, just leave it as is.
You can't (simply) use family names to group Polish names (and I think Slavic in general), because spelling of names often (but not always) depends on gender, e.g wife of Mr. Kowalski is Mrs. Kowalska.
Do you have any opinion on what the practices should be with names like yours? Should we somewhere between the view layer and the persistence layer convert to a canonical form? If so, can you provide any links for widely-preferred ways to do so?
About to find myself building some such systems and would prefer to do it right…
Names are just Unicode strings, which you should not fold, spindle or mutilate. Why do you care which bit is the family name, or if this person has a family name, or even if there is such a thing as a family name in their family's culture? Why even try for a canonical form?
Even better, make name a _list_ of Unicode strings, so people who have multiple names (Chinese/Western, pre/post marriage, pre/post op) can give you the whole list to search on.
If asking for their credit-card details, ask "Name As Printed On This Credit Card" and store that as well so you can bill them. That's really a property of the card, not the person.
And then just add a "Informal Name" field so you're not stuck guessing which bit is the informal name, or writing "Dear Roderick Frederick Ronald Arnold William MacArthur McBan" which does sound kind of dorky ...
- Many non-sanitized SQL queries fail,
- buggy URL or HTML parsers create code like:
- Pagetitles like: Spring in /'t Veld- Added smart quotes "Spring in ’t Veld" from Word or Rich Text Editors cause problems with sorting and identity consolidation.
- Stripping the quote character trips a 2-letter requirement.
- etc.
It is unlikely that people will name their son
But you should at least prepare your db queries for a quote. :)http://xkcd.com/327/