Data Loss Prevention

 View Only
  • 1.  Regular expression to locate string within 3 words

    Posted Dec 24, 2015 10:55 AM

    Need your help DLP and regular expression gurus, I am attemtping to use a data identified to locate US Passport numbers, but only when the  value matched is within 2 to 3 words of the word (Visa or Passport). Below is my attempt, works great in any regular expression tester (http://www.regextester.com/)

     

    My expression so far:

    \b(?:c[0-9a-zA-Z]{6,9}(?:\W+\w+){1,2}?\W+visa|visa(?:\W+\w+){1,2}?\W+c[0-9a-zA-Z]{6,9})\b

     

    Test data (When testing it workds to find the first 4 words as a match):

    c123456df at the visa   passport

     

    So although this works in the regular expression tester it does not work in DLP data identifier or directly into policy. I am sure DLP does not use Javascript regular expressions like we would expect. I located an artcle which states that it may use Boost, but not sure how to test Boost.

     

    Please help



  • 2.  RE: Regular expression to locate string within 3 words
    Best Answer

    Trusted Advisor
    Posted Dec 25, 2015 02:53 AM

    hello,

     dont know if it is a typo or not i your question..do you try to use a data identifier ?

    Take care that data identifier patterns does not use all regexp capabilities. (seems to me there some input on that in DLP admin guide).

    Dont know which DLP version you are using, but on 12.5.0, using \b in data identifier pattern make crash our DLP system (message processing crashed

    and no policies are applied).

    if you dont have any validator or need for uniqueness, you should better use a directly a regexp rule

     

    DLP is using perl regular expression, you could find some input in this article :

    https://support.symantec.com/en_US/article.HOWTO53607.html

     

    you may test something like that in DLP:

    c[0-9a-zA-Z]{6,9} (\w+)\s(\w+)((\s(\w+))?) (visa|passport)

     

     regards



  • 3.  RE: Regular expression to locate string within 3 words

    Posted Jan 04, 2016 11:10 AM

    Stephane, using the regex directly rather than via a data identifier, works. Thank you very much.