|
From: RobG on 26 Jun 2008 00:40 On Jun 26, 8:31 am, gentsqu...(a)gmail.com wrote: > In a setting where I can specify only a JS regular > expression, but not the JS code that will use it, I seek > a regexp component that matches a string of letters, > ignoring case. E.g, for "cat" I'd like the effect of > > ([Cc][Aa][Tt]) > > but without having to have many occurrences of [Xx]. var reA = /cat/i; Will match the string 'cat' anywhere it appears regardless of case. If you want to match the word cat exactly, then: var reA = /\bcat\b/i; Sample use: if (re.test(string)) { // the pattern was found } > Secondly, what is an efficient regexp that matches a > string exactly when ALL words in a certain list occur in > the string. I'd like the effect of > > (cat.*nip|nip.*cat) I'm not sure what you mean by "matches a string exactly", do you mean the word? If you meant you want a single RegExp to match a set of patterns in any order (i.e. in the above example either cat then nip or nip then cat), I don't think that can be done. Javascript regular expressions have an alternative operator '|' (kind of an OR operator), but no equivalent for AND. Lookahead doesn't help either, as it still requires an order to the patterns. It can easily be done in a loop using RegExp as a constructor, but I don't think that's what you want, e.g. function matchWords(s, wordArray) { var len = wordArray.length; var result = true; while (i-- && result) { var re = new RegExp('\\b' + wordArray[i] + '\\b', 'i'); result = re.test(string); } return result; } alert( matchWords('The cat ate some cat nip', ['nip','cat']) ); Note that when using RegExp to construct a reqular expression, the backslash '\' character denoting a special character must be quoted and so becomes '\\'. Also, the regular expression's idea of a word boundary might be different to what you expect. > except that there are N words rather than just the two > words "cat" and "nip". (I can assume that no word in the > list is a prefix of any other.) Naturally, I'm looking for > a regexp-solution that does not involve listing all > N factorial > many orderings. I don't think you can do that with a single regular expression. -- Rob
From: Lasse Reichstein Nielsen on 26 Jun 2008 01:13 RobG <rgqld(a)iinet.net.au> writes: > If you meant you want a single RegExp to match a set of patterns in > any order (i.e. in the above example either cat then nip or nip then > cat), I don't think that can be done. > Javascript regular expressions have an alternative operator '|' (kind > of an OR operator), but no equivalent for AND. Lookahead doesn't help > either, as it still requires an order to the patterns. How about: (?=.*\bcat\b)(?=.*\bnip\b)(?=.*\bfoo\b)(?=.*\bbar\b)(?=.*\bbaz\b) I.e., several lookaheads. It won't be pretty, and it definitly won't perform very well, but it should be correct. /L -- Lasse Reichstein Nielsen DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
From: RobG on 26 Jun 2008 02:01 On Jun 26, 3:13 pm, Lasse Reichstein Nielsen <l...(a)hotpop.com> wrote: > RobG <rg...(a)iinet.net.au> writes: > > If you meant you want a single RegExp to match a set of patterns in > > any order (i.e. in the above example either cat then nip or nip then > > cat), I don't think that can be done. > > Javascript regular expressions have an alternative operator '|' (kind > > of an OR operator), but no equivalent for AND. Lookahead doesn't help > > either, as it still requires an order to the patterns. > > How about: > > (?=.*\bcat\b)(?=.*\bnip\b)(?=.*\bfoo\b)(?=.*\bbar\b)(?=.*\bbaz\b) > > I.e., several lookaheads. > It won't be pretty, and it definitly won't perform very well, but > it should be correct. Cool, I thought that order would still matter. For the OP, the string needs to be a single line, no line feeds etc. Some play code: <script type="text/javascript"> function getRE(wordArray) { var re = []; for (var i=0, len=wordArray.length; i<len; i++) { re.push('(?=.*\\b' + wordArray[i] + '\\b)'); } return new RegExp(re.join(''), 'i'); } </script> <textarea id="ta">The cat sat on the mat and drank the milk</textarea> <input id="inp0" type="text" value="milk cat sat"> <input type="button" value="Test" onclick=" // Make sure s is a single line of text var s = document.getElementById('ta').value.replace(/\s/g,' '); var words = document.getElementById('inp0').value.split(' '); var re = getRE(words); alert( 'String: ' + s + '\n\nExpression: ' + re + '\n\nTest: ' + re.test(s) ); "> PS. Putting many statements inside the value of an onclick attribute is not good form, but OK for play code. :-) -- Rob
From: Thomas 'PointedEars' Lahn on 26 Jun 2008 02:17 RobG wrote: > If you want to match the word cat exactly, then: > > var reA = /\bcat\b/i; That depends on how you define a word. If you define a word as a sequence of word characters as specified in the ECMAScript Language Specification, Ed. 3 Final, section 15.10.2.6 (i.e. those matching /[0-9A-Za-z_]/), you are right. However, for example "Menü" is a word in German, and var reA = /\bmen\b/i; will (only) match the "Men" in "Menü" there. Because `ü' is not considered a word character per the Specification, and so the empty word ε between "n" and "ü" constitutes a word boundary matched by /\b/ (as e.g. "Menü".match(/\bmen\b/i) shows). So for matching Unicode words in strings, you have to use var reA = /(^|\s)cat(\s|$)/i; instead; that is, a character sequence (here: without whitespace in-between) bounded by whitespace, or one or two input boundaries. PointedEars -- Anyone who slaps a 'this page is best viewed with Browser X' label on a Web page appears to be yearning for the bad old days, before the Web, when you had very little chance of reading a document written on another computer, another word processor, or another network. -- Tim Berners-Lee
From: RobG on 26 Jun 2008 05:29 On Jun 26, 4:17 pm, Thomas 'PointedEars' Lahn <PointedE...(a)web.de> wrote: > RobG wrote: > > If you want to match the word cat exactly, then: > > > var reA = /\bcat\b/i; > > That depends on how you define a word. If you define a word as a sequence > of word characters as specified in the ECMAScript Language Specification, > Ed. 3 Final, section 15.10.2.6 (i.e. those matching /[0-9A-Za-z_]/), you are > right. > > However, for example "Men¨¹" is a word in German, and > > var reA = /\bmen\b/i; > > will (only) match the "Men" in "Men¨¹" there. Because `¨¹' is not considered > a word character per the Specification, Hence I included the sentence "Also, the regular expression's idea of a word boundary might be different to what you expect." > and so the empty word ¦Å between "n" > and "¨¹" constitutes a word boundary matched by /\b/ (as e.g. > > "Men¨¹".match(/\bmen\b/i) > > shows). > > So for matching Unicode words in strings, you have to use > > var reA = /(^|\s)cat(\s|$)/i; That expression is commonly used for matching values in the HTML class attribute where the separator is specified as being whitespace. It is not sufficient for matching words in general where they may be followed by punctuation marks such as commas, semi-colons, colons, dashes, periods and so on. -- Rob
|
Next
|
Last
Pages: 1 2 Prev: splice an array into another array Next: GANHE R$500,00 POR MÊS COM SEU CELULAR |