By Mark Biek


2008-08-22 14:10:40 8 Comments

J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully

I have this gigantic ugly string and I'm trying to extract pieces from it using regex.

In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).

Here's the regex I've been playing with

Project name:\s+(.*)\s+J[0-9]{7}:

The problem is that it doesn't stop until it hits the J0000020: at the end.

How do I make the regex stop at the first occurrence of J[0-9]{7}?

5 comments

@Konrad Rudolph 2008-08-22 14:15:57

Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.

However, consider using a negative character class instead:

Project name:\s+(\S*)\s+J[0-9]{7}:

\S means “everything except a whitespace and this is exactly what you want.

@CertainPerformance 2018-10-30 09:26:16

When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.

@Shailendra 2018-07-16 08:05:39

(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)

This will work for you.

Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*

@Svend 2008-08-22 14:24:12

Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.

Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.

string m = Regex.Match(s, @"Project name: (?<name>.*?) J\d+").Groups["name"].Value;

@jj33 2008-08-22 14:12:01

Make .* non-greedy by adding '?' after it:

Project name:\s+(.*?)\s+J[0-9]{7}:

@Dr Manhattan 2019-10-11 08:00:10

That is the most awesome regex tip i've come across

@Hershi 2008-08-22 14:17:21

I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.

One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.

For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.

Available for download at their site: http://www.ultrapico.com/Expresso.htm

Express download: http://www.ultrapico.com/ExpressoDownload.htm

@Matt M. 2018-11-18 04:08:14

There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.

Related Questions

Sponsored Content

7 Answered Questions

[SOLVED] Regular expression to stop at first match

  • 2010-03-23 20:36:35
  • publicRavi
  • 539265 View
  • 491 Score
  • 7 Answer
  • Tags:   regex

11 Answered Questions

[SOLVED] How to negate specific word in regex?

  • 2009-08-06 17:20:45
  • Bostone
  • 643339 View
  • 596 Score
  • 11 Answer
  • Tags:   regex

34 Answered Questions

[SOLVED] RegEx match open tags except XHTML self-contained tags

  • 2009-11-13 22:38:26
  • Jeff
  • 2708817 View
  • 1323 Score
  • 34 Answer
  • Tags:   html regex xhtml

18 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 722086 View
  • 1279 Score
  • 18 Answer
  • Tags:   javascript regex

9 Answered Questions

[SOLVED] Check whether a string matches a regex in JS

13 Answered Questions

[SOLVED] Regex Match all characters between two strings

  • 2011-05-24 11:45:58
  • 0xbadf00d
  • 661057 View
  • 378 Score
  • 13 Answer
  • Tags:   regex

15 Answered Questions

[SOLVED] Regex to match only letters

3 Answered Questions

[SOLVED] Match all occurrences of a regex

  • 2008-09-17 05:46:26
  • Chris Bunch
  • 182037 View
  • 571 Score
  • 3 Answer
  • Tags:   ruby regex

11 Answered Questions

[SOLVED] Regex: matching up to the first occurrence of a character

  • 2010-01-06 13:18:46
  • Leon Fedotov
  • 462226 View
  • 321 Score
  • 11 Answer
  • Tags:   regex

8 Answered Questions

[SOLVED] How can I make my match non greedy in vim?

Sponsored Content