Cara menggunakan isalpha in php

I'm working on a application which supports several languages and has a functionality in place which tries to use the language requested by the browser and also allows manual override of this function. This part works fine and picks the correct templates, labels, etc.

User have to enter sometimes text on their own and that's where I run into issues because the application has to accept even "complicated" languages like Chinese and Russian. So far I've taken care of the things mentioned in other posting, i.e.:

  • calling mb_internal_encoding( 'UTF-8' )
  • setting the right encoding when rendering the webpages with meta http-equiv=Content-Type content=text/html;charset=UTF-8 (format adapted due to stackoverflow limitations)
  • even the content arrives correctly, because mb_detect_encoding() == UTF-8
  • tried to set setLocale(LC_CTYPE, "UTF-8"), which doesn't seem to work because it requires the selection of one language, which I can't specify because I have to support several. And it still fails if I force it manually for testing purposes, i.e. with; setLocale(LC_CTYPE,"zh__CN.utf8") - ctype_alpha() would still fail for Chinese text

It seems that even explicit language selection doesn't make ctype_alpha() useful.

Hence the question is: how should I check for alphabetic characters in all languages?

The only idea I had at the moment is to check manually with arrays of "valid" characters - but this seems ugly especially for Chinese.

How would you solve this issue?

BenMorel

32.7k48 gold badges170 silver badges302 bronze badges

asked Jun 7, 2009 at 9:35

0

If you'd like to check only for valid unicode letters regardless of the used language I'd propose to use a regular expression (if your pcre-regex extension is built with unicode support):

// adjust pattern to your needs
// $input needs to be UTF-8 encoded
if (preg_match('/^\p{L}+$/u', $input)) {
    // OK
} else {
    // not OK
}

\p{L} checks for unicode characters with the L(etter) property which includes the properties Ll (lower case letter), Lm (modifier letter), Lo (other letter), Lt (title case letter) and Lu (upper case letter) - from: Regular Expression Details).

answered Jun 7, 2009 at 11:16

Stefan GehrigStefan Gehrig

81.5k24 gold badges155 silver badges185 bronze badges

1

I wouldn't use an array of characters. That would get impossible to manage.

What I'd suggest is working out a 'default' language from the IP address and using that as the locale for a request. You could also get it from the browser-agent string in some cases. You could provide the user a way to override so that if your default isn't correct they aren't stuck with a strange site. (E.g. provide on the form 'language set to english. If this isn't correct, please change: '. This isn't the nicest thing to provide but you won't get any working validation otherwise as you NEED a language/locale set in order to have a sensible alpha validation (An A isn't a letter in chinese).

answered Jun 7, 2009 at 9:49

workmad3workmad3

24.7k4 gold badges35 silver badges56 bronze badges

0

You can use the languages from

$_SERVER['HTTP_ACCEPT_LANGUAGE']

It contains something like

de-de,de;q=0.8,en-us;q=0.5,en;q=0.3

so you need to parse this string. Then you can use the preferred language in the setLocale function.

answered Jun 7, 2009 at 10:40

slosdslosd

3,0142 gold badges20 silver badges17 bronze badges

This is rather an encoding issue than a language detection issue. Because UTF-8 can encode any Unicode character.

The best approach is to use UTF-8 throughout your project: in your database, in your output and as expected encoding for the input.

  • Output    Make sure you encode your data with UTF-8 and declare that in the HTTP header in the Content-Type field and not just in the document itself.
  • Input    If you’re using forms, declare the expected encoding in the accept-charset attribute.

answered Jun 7, 2009 at 13:05

GumboGumbo

628k106 gold badges767 silver badges838 bronze badges