Remove all special characters except percentage (real time use case)?

How to remove all special characters except percentage (real time use case)?

If you are running a coupon website, then you will have this issue sometime while parsing/scrapping coupons from merchants.

 

For example,

you will have these three types of strings:
1. Get 10% Cashback
2. Get Rs. 10/- Cashback
3. Get 10/- Cashback
4. Get (Indian rupee symbol) 10 Cashback
If you use the below regex pattern then it will remove all the special characters:

[ ](?=[ ])|[^-_,A-Za-z0-9]+

 

 

Now your sample strings will become like this,

1. Get 10 Cashback
2. Get Rs 10- Cashback
3. Get 10- Cashback
4. Get 10 Cashback

Now 1 and 4 returns same output with out regex, which is wrong. (10%, Rupees(inr currency symbol)10 returns 10, because of all special characters removal regex)

 

But if you rewrite your logic to remove all special character than percentage(%) then you will not face this issue,

[ ](?=[ ])|[^-_,A-Za-z0-9?!%]+

 

?!% => to escape a particular character it needs to be given with ?!

 

Now it is:
1. Get 10% Cashback
2. Get Rs 10- Cashback
3. Get 10- Cashback
4. Get 10 Cashback
changed a bit more to remove to keep /- and Rs. as well:

[ ](?=[ ])|[^-_,A-Za-z0-9?!%?!Rs.?!/\\- ]+

Now it is:
1. Get 10% Cashback
2. Get Rs. 10/- Cashback
3. Get 10/- Cashback
4. Get 10 Cashback
Now it is perfect.

Full source code of the above regex example:

[java]
package in.javadomain;

public class RegexExceptPercentage {

public static void main(String[] args) {

String str1 = “Get 10% Cashback”;
String str2 = “Get Rs. 10/- Cashback”;
String str3 = “Get 10/- Cashback”;
String str4 = “Get ? 10 Cashback”;

// Regex only except percentage
String afterRem1 = str1.replaceAll(“[ ](?=[ ])|[^-_,A-Za-z0-9 ?!% ]+”, “”);
System.out.println(afterRem1);

// Regex except Rs. & /-
String afterRem2 = str2.replaceAll(“[ ](?=[ ])|[^-_,A-Za-z0-9 ?!Rs.?!/\\- ]+”, “”);
System.out.println(afterRem2);

// Regex except /-
String afterRem3 = str3.replaceAll(“[ ](?=[ ])|[^-_,A-Za-z0-9 !/\\- ]+”, “”);
System.out.println(afterRem3);

String afterRem4 = str4.replaceAll(“[ ](?=[ ])|[^-_,A-Za-z0-9 ]+”, “”);
System.out.println(afterRem4);
}
}

[/java]

Output:

[plain]
Get 10% Cashback
Get Rs. 10/- Cashback
Get 10/- Cashback
Get 10 Cashback
[/plain]

Leave a Reply