Issue
i have this function to identify credit card by regex in input string and mask it without the last 4 digits:
public CharSequence obfuscate(CharSequence data) {
String[] result = data.toString().replaceAll("[^a-zA-Z0-9-_*]", " ").trim().replaceAll(" +", " ").split(" ");
for(String str : result){
String originalString = str;
String cleanString = str.replaceAll("[-_]","");
CardType cardType = CardType.detect(cleanString);
if(!CardType.UNKNOWN.equals(cardType)){
String maskedReplacement = maskWithoutLast4Digits(cleanString ,replacement);
data = data.toString().replace(originalString , maskedReplacement);
}
}
return data;
}
static String maskWithoutLast4Digits(String input , String replacement) {
if(input.length() < 4){
return input;
}
return input.replaceAll(".(?=.{4})", replacement);
}
//pattern enum
public enum CardType {
UNKNOWN,
VISA("^4[0-9]{12}(?:[0-9]{3}){0,2}$"),
MASTERCARD("^(?:5[1-5]|2(?!2([01]|20)|7(2[1-9]|3))[2-7])\\d{14}$"),
AMERICAN_EXPRESS("^3[47][0-9]{13}$"),
DINERS_CLUB("^3(?:0[0-5]|[68][0-9])[0-9]{11}$"),
DISCOVER("^6(?:011|[45][0-9]{2})[0-9]{12}$");
private Pattern pattern;
CardType() {
this.pattern = null;
}
CardType(String pattern) {
this.pattern = Pattern.compile(pattern);
}
public static CardType detect(String cardNumber) {
for (CardType cardType : CardType.values()) {
if (null == cardType.pattern) continue;
if (cardType.pattern.matcher(cardNumber).matches()) return cardType;
}
return UNKNOWN;
}
public Pattern getPattern() {
return pattern;
}
}
input1: "Valid American Express card: 371449635398431".
output1: "Valid American Express card: ***********8431"
input2: "Invalid credit card: 1234222222222" //not mach any credit card pattern
output2: "Invalid credit card: 1234222222222"
input3: "Valid American Express card with garbage characters: <3714-4963-5398-431>"
output: "Valid American Express card with garbage characters: <***********8431>"
this is not the best way to to do the masking since this method will be called for each tag in huge html and each line in huge text files how i can improve the performance of this method
Solution
This Post is solely based on the comments in the Answer above and in particular this comment from the OP:
And also the input string can be "my phone number 12345678 and credit card 1234567890"
If you're bent on RegEx and you want to retrieve a phone number and or a Credit Card number from a specific String then you can use this Java regular expression:
String regex = String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|"
+ "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+" // Phone Numbers
+ "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)"; // Credit Cards
To use this regex string you would want to run it through a Pattern/Matcher mechanism, for example:
String strg = "Valid Phone #: <+1 (212) 555-3456> - "
+ "Valid American Express card 24 with garbage 33.6 characters: <3714-4963-5398-431>";
final java.util.List<String> numbers = new java.util.ArrayList<>();
final String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|" // Phone Numbers
+ "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+" // Credit Cards
+ "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)";
final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); // the regex
final java.util.regex.Matcher matcher = pattern.matcher(strg); // your string
while (matcher.find()) {
numbers.add(matcher.group());
}
for (String str : numbers) {
System.out.println(str);
}
With the above supplied String the Console Window would display:
+1 (212) 555-3456
3714-4963-5398-431
Consider these the original Phone number and Credit-Card number substrings. Place these strings into repsective variables like origPhoneNum and origcreditCardNum. Now validate the numbers. You already have the tool provided to validate a credit card number in the previous answer. And here is one to validate a phone number:
public static boolean isValidPhoneNumber(String phoneNumber) {
return phoneNumber.matches("^(?!\\b(0)\\1+\\b)(\\+?\\d{1,3}[. -]?)?"
+ "\\(?\\d{3}\\)?([. -]?)\\d{3}\\3\\d{4}$");
}
I have tested the above provided regex string against phone numbers from many different countries in many different formats with success. It was also tested against many different Credit Card numbers in many different formats, again with success. Never the less there will of course always be some format that may cause a particular problem since there are obviously no rules what-so-ever for number entries at the source of data generation.
Take the comment line I had shown at the top of this post:
And also the input string can be "my phone number 12345678 and credit card 1234567890"
There is no way to distinguish which number is suppose to be a phone number and which is suppose to be a credit card number unless it specifically states as such with text within the string as the above string does. Tomorrow or next week it might not because there just doesn't look like there are any data entry rules in play here.
The string indicates a phone number of 12345678
which is 8 digits. The string also indicates a credit card number of 1234567890
. Internationally, phone numbers can range from 9 to as many as 13 digits depending on the country. Locally the number of digits range would be smaller again, depending on the country. Since phone numbers (internationally) have such a great number of digits range there is no way to know that the number deemed to be credit card number is in fact a credit card number unless the string tells you either before the number or after it. Which will it be in the next input string if at all.?
For this I leave it for you to decide how to deal with this situation but whatever it is, don't expect any great speed from it. It's like I had written at the beginning of my previous answer:
Wouldn't it be nice if all validations were done before the card numbers
went into the database (or data files).
EDIT: Based on your latest comments under the earlier answer:
I whipped up a small demo:
// Place this code into a method or event somewhere...
String inputString = "my phone number is +54 123 344-4567 and CC 2222 4053 4324 8877 bla bla bla";
System.out.println("Input: " + inputString);
System.out.println();
final java.util.List<String> numbers = new java.util.ArrayList<>();
final String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|" // Phone Numbers
+ "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+" // Credit Cards
+ "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)";
final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
final java.util.regex.Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
numbers.add(matcher.group());
}
String outputString = inputString;
for (String str : numbers) {
//System.out.println(str); // Uncomment for testing.
// Is substring a valid Phone Number?
int len = str.replaceAll("\\D","").length(); // Crushed number length
if (isValidPhoneNumber(str)) {
outputString = outputString.replace(str, maskAllExceptLast(str, 3, "x"));
}
else if (isValidCreditCardNumber(str)) {
outputString = outputString.replace(str,
maskAllExceptLast(str.replaceAll("\\D",""), 4, "*"));
}
}
System.out.println("Output: " + outputString);
Support methods....
public static String maskAllExceptLast (String inputString, int exceptLast_N, String... maskCharacter) {
if(inputString.length() < exceptLast_N){
return inputString;
}
String mask = "*"; // Default mask character.
if (maskCharacter.length > 0) {
mask = maskCharacter[0];
}
return inputString.replaceAll(".(?=.{" + exceptLast_N + "})", mask);
}
/**
* Method to validate a supplied phone number. Currently validates phone
* numbers supplied in the following fashion:
* <pre>
*
* Phone number 1234567890 validation result: true
* Phone number 123-456-7890 validation result: true
* Phone number 123-456-7890 x1234 validation result: true
* Phone number 123-456-7890 ext1234 validation result: true
* Phone number (123)-456-7890 validation result: true
* Phone number 123.456.7890 validation result: true
* Phone number 123 456 7890 validation result: true
* Phone number 01 123 456 7890 validation result: true
* Phone number 1 123-456-7890 validation result: true
* Phone number 1-123-456-7890 validation result: true</pre>
*
* @param phoneNumber (String) The phone number to check.<br>
*
* @return (boolean) True is returned if the supplied phone number is valid.
* False if it isn't.
*/
public static boolean isValidPhoneNumber(String phoneNumber) {
boolean isValid = false;
long len = phoneNumber.replaceAll("\\D","").length(); // Crush the phone Number into only digits
// Check phone Number's length range. Must be from 8 to 12 digits long
if (len < 8 || len > 12) {
return false;
}
// Validate phone numbers of format "xxxxxxxx to xxxxxxxxxxxx"
else if (phoneNumber.matches("\\d+")) {
isValid = true;
}
//validating phone number with -, . or spaces
else if (phoneNumber.matches("^(\\+\\d{1,3}( )?)?((\\(\\d{1,3}\\))|\\d{1,3})[- .]?\\d{3,4}[- .]?\\d{4}$")) {
isValid = true;
}
/* Validating phone number with -, . or spaces and long distance prefix.
This regex also ensures:
- The actual number (withoug LD prefix) should be 10 digits only.
- For North American, numbers with area code may be surrounded
with parentheses ().
- The country code can be 1 to 3 digits long. Optionally may be
preceded by a + sign.
- There may be dashes, spaces, dots or no spaces between country
code, area code and the rest of the number.
- A valid phone number cannot be all zeros. */
else if (phoneNumber.matches("^(?!\\b(0)\\1+\\b)(\\+?\\d{1,3}[. -]?)?"
+ "\\(?\\d{3}\\)?([. -]?)\\d{3}\\3\\d{4}$")) {
isValid = true;
}
//validating phone number with extension length from 3 to 5
else if (phoneNumber.matches("\\d{3}-\\d{3}-\\d{4}\\s(x|(ext))\\d{3,5}")) {
isValid = true;
}
//validating phone number where area code is in braces ()
else if (phoneNumber.matches("^(\\(\\d{1,3}\\)|\\d{1,3})[- .]?\\d{2,4}[- .]?\\d{4}$")) {
isValid = true;
}
//return false if nothing matches the input
else {
isValid = false;
}
return isValid;
}
/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'. First this method validates for a correct Card
* Network Number. The supported networks are:<pre>
*
* Number Card Network
* ====================================
* 2 Mastercard (BIN 2-Series) This is NEW!!
* 30, 36, 38, 39 Diners-Club
* 34, 37 American Express
* 35 JBC
* 4 Visa
* 5 Mastercard
* 6 Discovery</pre><br>
*
* Next, the overall Credit Card number is checked with the 'Luhn Algorithm'
* for validity.<br>
*
* @param cardNumber (String)
*
* @return (Boolean) True if valid, false if not.
*/
public static boolean isValidCreditCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
// Strip card number of all non-digit characters.
cardNumber = cardNumber.replaceAll("\\D", "");
long len = cardNumber.length();
if (len < 14 || len > 16) { // Only going to 16 digits here
return false;
}
// Validate Card Network
String[] cardNetworks = {"2", "30", "34", "35", "36", "37", "38", "39", "4", "5", "6"};
String cardNetNum = cardNumber.substring(0, (cardNumber.startsWith("3") ? 2 : 1));
boolean pass = false;
for (String netNum : cardNetworks) {
if (netNum.equals(cardNetNum)) {
pass = true;
break;
}
}
if (!pass) {
return false; // Invalid Card Network
}
// Validate card number with the 'Luhn algorithm'.
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
nSum += d / 10;
nSum += d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}
The code above will by no means be fast!
Tweak the regex or code to suit your specific needs.
Answered By - DevilsHnd
Answer Checked By - Mildred Charles (JavaFixing Admin)