Post

(κ΅¬ν˜„) KGrep

(κ΅¬ν˜„) KGrep

πŸ“ μž‘μ„± λ°°κ²½

μ΅œκ·Όμ— μ΄λŸ°μ €λŸ° 이유둜 라이브 μ½”λ”©ν…ŒμŠ€νŠΈμ— 관심을 κ°–κ²Œ λ˜μ—ˆλŠ”λ°, μΏ ν‚€λŸ°μœΌλ‘œ 유λͺ…ν•œ λ°λΈŒμ‹œμŠ€ν„°μ¦ˆ λΌλŠ” νšŒμ‚¬μ— 라이브 μ½”ν…Œκ°€ 유λͺ…ν•˜λ‹€κ³ 
μ–΄λ””μ„œ λ“€μ–΄μ„œ μ–΄λ–€ μ‹μœΌλ‘œ μ§„ν–‰λ˜λŠ”μ§€ κΆκΈˆν•˜μ—¬ μ°Ύμ•„λ³΄λ˜ 쀑 λ°λΈŒμ‹œμŠ€ν„°μ¦ˆμ—μ„œ 22년도에 μž‘μ„±λœ κ²Œμ‹œκΈ€μ„ ν•˜λ‚˜ λ°œκ²¬ν•˜κ²Œ λ˜μ—ˆκ³ ,
ν•΄λ‹Ή κ²Œμ‹œκΈ€μ— μžˆλŠ” λ¬Έμ œμ€‘ grep 을 직접 κ΅¬ν˜„ν•΄λ³΄λŠ” λ¬Έμ œκ°€ μžˆμ–΄ 주말에 잠깐 ν’€μ–΄λ΄€λŠ”λ° 재미있게 ν’€μ—ˆλ˜ 기얡이 μžˆμ–΄
ν•΄λ‹Ή λ‚΄μš©μ— λŒ€ν•΄ μž‘μ„±ν•΄λ³΄λ €κ³  ν•œλ‹€.

λ¬Έμ œμ— λŒ€ν•œ μžμ„Έν•œ λ‚΄μš© 및 κΆκΈˆν•œ λ‚΄μš©μ€ μ•„λž˜ λ°λΈŒμ‹œμŠ€ν„°μ¦ˆ κ²Œμ‹œκΈ€μ„ μ°Έκ³ ν•˜λ©΄ 쒋을것 κ°™λ‹€.


πŸ‘“ μ„  3쀄 μš”μ•½

  • Kotlin으둜 grepκ³Ό μœ μ‚¬ν•œ κΈ°λŠ₯을 ν•˜λŠ” λ¬Έμžμ—΄ 검색 도ꡬλ₯Ό λ¨Όμ € μ‹±κΈ€μŠ€λ ˆλ“œλ‘œ κ΅¬ν˜„ν•œ ν›„ λ©€ν‹°μŠ€λ ˆλ“œλ‘œ κ°œμ„ ν•˜μ˜€λ‹€.
  • 파일 탐색과 λ¬Έμžμ—΄ 검색을 λΆ„λ¦¬ν•˜κ³  파일 λ‹¨μœ„λ‘œ μŠ€λ ˆλ“œλ₯Ό ν• λ‹Ήν•˜λŠ” λ°©μ‹μœΌλ‘œ 병렬 처리λ₯Ό κ΅¬ν˜„ν•˜μ˜€λ‹€.
  • λ©€ν‹°μŠ€λ ˆλ“œ 버전(KGrepV2)은 원본 grep보닀 μ•½ 88% λΉ λ₯Έ 속도(1.16초 vs 9.69초)λ₯Ό λ³΄μ˜€μœΌλ‚˜, CPU μžμ›μ€ μ•½ 8.5λ°°(832% vs 98%) 더 많이 μ‚¬μš©ν•œλ‹€.

πŸ”οΈ 문제 μš”κ΅¬μ‚¬ν•­

μΏ ν‚€λŸ° νšŒμ‚¬ λ‹΅κ²Œ λ¬Έμ œμ—λ„ μΏ ν‚€κ°€ λ“±μž₯ ν•˜λŠ”κ±Έ λ³Ό 수 μžˆλ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
μš©κ°ν•œ μΏ ν‚€λŠ” ν‰μ†Œ μ‚¬μš©ν•˜λŠ” 파일 λ‚΄ λ¬Έμžμ—΄ 검색 툴 grep보닀 더 λΉ λ₯Έ 속도λ₯Ό μžλž‘ν•˜λŠ” ackλ‚˜ ag같은 λ‹€μ–‘ν•œ νˆ΄λ“€μ΄ μžˆλ‹€λŠ” 것을 λ“€μ—ˆλ‹€.
μžμ‹ λ§Œμ˜ dgrep을 λ§Œλ“€μ–΄ 널리 세상을 이둭게 ν•˜κ³  μ‹Άλ‹€κ³  μƒκ°ν•˜κ²Œ 된 μš©κ°ν•œ μΏ ν‚€λŠ” κ·Έ 첫 λ‹¨κ³„λ‘œ λ©€ν‹°μŠ€λ ˆλ“œλ‘œ λ™μž‘ν•˜λŠ” grep을 직접 λ§Œλ“€μ–΄λ³΄κΈ°λ‘œ κ²°μ‹¬ν•˜μ˜€λ‹€.
μ›ν•˜λŠ” μŠ€νŽ™μ— λ§žλŠ” 파일 λ‚΄ λ¬Έμžμ—΄ 검색 νˆ΄μ„ λ§Œλ“€μ–΄λ³΄μž.

μž…λ ₯ ν˜•μ‹
 - dgrep {keyword} {relative path}

좜λ ₯ ν˜•μ‹
 - 파일의 각 line에 keywordκ°€ μžˆλŠ” 경우, ν•΄λ‹Ή 파일과 쀄 번호λ₯Ό 좜λ ₯ν•œλ‹€.

쑰건
1. relative pathκ°€ 디렉토리인 경우 디렉토리 λ‚΄ λͺ¨λ“  νŒŒμΌμ— λŒ€ν•΄ 검사λ₯Ό μ§„ν–‰ν•œλ‹€.
2. relative path 내에 또 λ‹€λ₯Έ 디렉토리가 μ‘΄μž¬ν•˜λŠ” 경우, 각 디렉토리 λ‚΄ λͺ¨λ“  νŒŒμΌμ— λŒ€ν•œ 검사 λ˜ν•œ μ§„ν–‰ν•œλ‹€.
3. λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ΄μš©ν•˜μ—¬ μ΅œλŒ€ν•œ λΉ λ₯΄κ²Œ μž‘μ—…μ„ μ™„λ£Œν•˜λ„λ‘ μž‘μ„±ν•œλ‹€.
4. λ™μΌν•œ νŒŒμΌμ— λŒ€ν•œ 검사 κ²°κ³ΌλŠ” ν•œ λ²ˆμ— 좜λ ₯λ˜μ–΄μ•Ό ν•œλ‹€.
5. Directory λ‚΄ symlinkλŠ” μ—†λ‹€κ³  κ°€μ •ν•œλ‹€.
6. νŒŒμΌλ“€μ€ λͺ¨λ‘ UTF8 μΈμ½”λ”©μœΌλ‘œ μž‘μ„±λœ Text파일이라고 κ°€μ •ν•œλ‹€.

많이 μ‚¬μš©λ˜λŠ” grep μ΄λΌλŠ” 툴 보닀 쑰금 더 λΉ λ₯΄κ²Œ λ¬Έμžμ—΄μ„ 검색할 수 μžˆλŠ” κΈ°λŠ₯을 갖은 dgrep 을 λ§Œλ“€λΌλŠ” μš”κ΅¬μ‚¬ν•­μ΄λ‹€. grep 의 λ§Žμ€ κΈ°λŠ₯λ“€ 쀑 μ‹¬ν”Œν•œ keyword 쑰회 κΈ°λŠ₯에 λŒ€ν•΄ κ΅¬ν˜„ ν•˜λ©΄ λ˜λŠ” 문제라고 μƒκ°ν–ˆλ‹€.

➑️ μ ‘κ·Ό λ°©ν–₯

grep 은 ν‰μ†Œμ—λ„ 많이 μ‚¬μš©ν•˜κ³  μžˆλŠ” 툴이라 μš”κ΅¬μ‚¬ν•­μ„ μ΄ν•΄ν•˜λŠ”λ° 어렀움이 μžˆμ§„ μ•Šμ•˜λ‹€.

κ°€μž₯ λ¨Όμ € μƒκ°ν•œ 뢀뢄은 μ–΄λ–»κ²Œ ν•˜λ©΄ λ‚΄κ°€ μ£Όλ ₯으둜 μ‚¬μš©ν•˜λŠ” μ–Έμ–΄μ—μ„œ grep κ³Ό 같은 λ™μž‘μ„ ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ„ λ§Œλ“€μˆ˜ μžˆμ„μ§€μ— λŒ€ν•œ λΆ€λΆ„μ΄μ˜€λ‹€. λ‹Ήμ—°νžˆ grep 보닀 빨라야 ν•˜λŠ”λ°, μ²˜μŒλΆ€ν„° 빠름에 집쀑을 ν•˜λ©΄ 거기에 λ„ˆλ¬΄ λ§€λͺ°λ κ²ƒ κ°™μ•„. μš°μ„  grep 처럼 λ™μž‘ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ„ λ§Œλ“€κ³  속도λ₯Ό κ°œμ„ μ‹œν‚€λŠ” λ°©ν–₯으둜 μ§„ν–‰ν•˜λŠ”κ²Œ 쒋을것 κ°™λ‹€κ³  μƒκ°ν–ˆλ‹€.

✨ κ΅¬ν˜„

μ‹±κΈ€ μŠ€λ ˆλ“œ

kotlin 을 μ‚¬μš©ν•΄μ„œ κ΅¬ν˜„ν–ˆλ‹€. 이름도 살짝 λ³€κ²½ν•΄λ΄€λ‹€.(dgrep -> kgrep

1
2
3
4
5
6
7
8
9
10
class KGrep(
    private val keyword: String,
    private val path: String
) { }

fun main(args: Array<String>) {
    val keyword: String = args[0]
    val path: String = args[1]
    val kGrep = KGrep(keyword, path)
}

μš°μ„  KGrep μ΄λΌλŠ” 클래슀λ₯Ό μ„ μ–Έν–ˆλ‹€. ν•΄λ‹Ή ν΄λž˜μŠ€λŠ” keyword와 path λ‘κ°œμ˜ properties λ₯Ό κ°–κ³  μžˆλ‹€. κ·Έ 이후 main ν•¨μˆ˜λ₯Ό λ§Œλ“€μ–΄μ€¬λ‹€. KGrep μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ„ μ‹€ν–‰ ν–ˆμ„λ•Œ λ™μž‘μ˜ 주체가 λ˜λŠ” ν•¨μˆ˜λ‹€. Java 도 κ·Έλ ‡κ³  Kotlin 도 κ·Έλ ‡κ³  main ν•¨μˆ˜μ—μ„œ args 즉 arguments 리슀트λ₯Ό νŒŒλΌλ―Έν„°λ‘œ 받을 수 μžˆλŠ”λ° 이 뢀뢄을 적극 ν™œμš©ν–ˆλ‹€. μ΄λ ‡κ²Œ 선언을 ν•˜κ²Œλ˜λ©΄ jar νŒŒμΌμ„ μ‹€ν–‰ ν–ˆμ„λ•Œ 뒀에 νŒŒλ¦¬λ―Έν„°λ₯Ό λ„μ–΄μ“°κΈ°λ‘œ κ΅¬λΆ„ν•˜μ—¬ λ„˜κ²¨μ€„ 수 있게 λœλ‹€.

1
java -jar KGrep.jar "keyword" "path"

μœ„μ˜ μ˜ˆμ‹œμ²˜λŸΌ KGrep.jar νŒŒμΌμ„ μ‹€ν–‰ μ‹œν‚¬λ•Œ 뒀에 keyword 와 path λ₯Ό 넣어쀄 수 있고, 이걸 main ν•¨μˆ˜μ—μ„œ μ‚¬μš©ν•  수 있게 λœλ‹€.

λ‹€μŒμœΌλ‘œλŠ” main ν•¨μˆ˜μ—μ„œ 호좜 ν•  KGrep λ‚΄λΆ€μ˜ search λ©”μ„œλ“œλ₯Ό μ„ μ–Έν–ˆλ‹€.

1
2
3
4
5
6
7
8
9
10
class KGrep(
    private val keyword: String,
    private val path: String
) {
    fun search() {
        val startPath = File(path)
    }
}

// ...

μ—¬κΈ°μ„œ λΆ€ν„° κ³ λ―Όλ˜λŠ” 뢀뢄이 μžˆμ—ˆλŠ”λ°, μš”κ΅¬μ‚¬ν•­μ„ 보면 디렉토리 내뢀에 또 λ‹€λ₯Έ 디렉토리가 μžˆμ„ 수 있으며, λ‚΄λΆ€μ μœΌλ‘œ λ“€μ–΄μžˆλŠ” 디렉토리에 λŒ€ν•΄μ„œλ„ λ¬Έμžμ—΄ 검색을 ν•΄μ•Όν•œλ‹€λŠ” λ‚΄μš©μ΄ μžˆλ‹€.

μ—¬κΈ°μ„œ λ“€μ—ˆλ˜ 고민이 μžˆμ—ˆλ‹€.

  1. path μ•ˆμ— μžˆλŠ” λͺ¨λ“  디렉토리에 λŒ€ν•΄ file 탐색을 λ¨Όμ € μ§„ν–‰ν•˜κ³ , ν•΄λ‹Ή file 듀에 λŒ€ν•΄ 순차적으둜 λ¬Έμžμ—΄ 검색을 ν• μ§€
  2. file 탐색과 λ¬Έμžμ—΄ 검색을 ν•¨κ»˜ μ§„ν–‰ν• μ§€

λ‚˜λŠ” 1번 방법을 μ„ νƒν•˜κΈ°λ‘œ ν–ˆλ‹€. κ°€μž₯ 큰 μ΄μœ λ‘œλŠ” 글을 쒀더 보닀보면 λ‚˜μ˜¬ν…λ° λ©€ν‹° μŠ€λ ˆλ“œμ— λŒ€ν•œ κ³ λ €λ₯Ό μ•ˆν•  수 μ—†μ—ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
λ‹¨μˆœ CLI λ‘€ 톡해 λ™μž‘ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ„ λ§Œλ“œλŠ” 거라면 2번 방법이 더 적합 ν•˜λ‹€ μƒκ°ν–ˆκ³  κ·Έλ ‡κ²Œ κ΅¬ν˜„μ„ ν–ˆμ„κ²ƒ 같은데
λ©€ν‹° μŠ€λ ˆλ“œ ν™˜κ²½μ—μ„œ κ΅¬ν˜„μ„ λΉ λ₯΄κ³  κ°„λ‹¨ν•˜κ²Œ ν•˜κΈ° μœ„ν•΄ 1번 방법을 μ„ νƒν•˜κ²Œ λ˜μ—ˆλ‹€.

λ‹¨μˆœ CLI λ₯Ό 톡해 λ™μž‘ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ€ 2λ²ˆμ„ μ„ νƒν–ˆμ„κ±°λΌ ν–ˆλŠ”λ° κ·Έ μ΄μœ λŠ” 1번 방법을 μ‚¬μš©ν•˜λ©΄ μ•„λž˜μ™€ 같은 뢀뢄이 λ§ˆμŒμ— κ±Έλ ΈκΈ° λ•Œλ¬Έμ΄λ‹€.

  1. 디렉토리 λ‚΄λΆ€μ˜ λͺ¨λ“  file 을 λ©”λͺ¨λ¦¬μ— 적재 μ‹œμΌœμ•Ό ν•œλ‹€.
  2. file 탐색이 λλ‚ λ•ŒκΉŒμ§€ λ¬Έμžμ—΄ 탐색을 μ§„ν–‰ν•˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ— κ²°κ³Όλ¬Ό 좜λ ₯이 λŠλ¦¬λ‹€.

λ‹€μ‹œ λŒμ•„κ°€μ„œ search λ©”μ„œλ“œμ˜ λ‚΄λΆ€λ₯Ό 확인해 보자.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
private val filesToProcess: Queue<File> = LinkedList()

fun search() {
    val startPath = File(path)
    collectFiles(startPath)
}

private fun collectFiles(startPath: File) {
    when {
        startPath.isFile -> filesToProcess.add(startPath)
        startPath.isDirectory -> startPath.listFiles()?.forEach { collectFiles(it) }
        else -> println("Cannot process this path: ${startPath.absolutePath}")
    }
}

Queue에 디렉토리λ₯Ό νƒμƒ‰ν•˜μ—¬ file 을 μ €μž₯ν•˜λŠ” collectFiles λ©”μ„œλ“œλ₯Ό μΆ”κ°€ν–ˆλ‹€.
startPath λ₯Ό μ‘°μ‚¬ν•΄μ„œ 파일이라면 Queue 에 적재λ₯Ό, 디렉토리면 listFiles λ©”μ„œλ“œλ₯Ό 톡해 ν•΄λ‹Ή 디렉토리에 μžˆλŠ” λͺ¨λ“  file 및 디렉토리λ₯Ό κ°–κ³  온 λ’€ forEach 와 μž¬κ·€λ₯Ό 톡해 file 리슀트λ₯Ό λ‹€μ‹œ ν•œλ²ˆ νƒμƒ‰ν•˜κ²Œλœλ‹€.

λ‹€μŒμœΌλ‘œλŠ” μ €μž₯ν•œ file에 λŒ€ν•΄ λ¬Έμžμ—΄μ„ κ²€μƒ‰ν•˜λŠ” searchInFile λ©”μ„œλ“œλ₯Ό μΆ”κ°€ν–ˆλ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private val results = HashMap<String, List<Pair<Int, String>>>()

private fun searchInFile(file: File) {
    if (!file.isFile || !file.canRead()) {
      return  
    }
    val matchingLines: MutableList<Pair<Int, String>> = mutableListOf()
    file.useLines { lines ->
        lines.forEachIndexed { index, line ->
            if (line.contains(keyword)) {
                matchingLines.add((index + 1) to line.trim())
            }
        }
    }
    if (matchingLines.isNotEmpty()) {
        results[file.path] = matchingLines
    }
}

νŒŒλΌλ―Έν„°λ‘œ λ„˜κ²¨ 받은 file 에 λŒ€ν•΄ κ°„λ‹¨ν•˜κ²Œ 검증을 μ§„ν–‰ ν•œ λ’€ νŠΉμ • 포멧에 맞게 결과물을 좜λ ₯ν•˜κΈ° μœ„ν•΄ matchingLines λ¦¬μŠ€νŠΈμ— 값을 μ €μž₯ν•˜κ³  μžˆλ‹€.
kotlin μ—μ„œλŠ” 기본적으둜 정말 λ‹€μ–‘ν•œ ν•¨μˆ˜λ“€μ΄ 제곡되고 μžˆλŠ”λ° ν•΄λ‹Ή λ©”μ„œλ“œμ—μ„œ μ—¬λŸ¬ ν•¨μˆ˜λ“€μ„ μ‚¬μš©ν•˜μ—¬ μ†μ‰½κ²Œ κ΅¬ν˜„μ΄ κ°€λŠ₯ν–ˆλ‹€.

  • useLines : file 을 bufferedReader 둜 λΌμΈλ‹¨μœ„λ‘œ μ²˜λ¦¬ν•˜μ—¬ 각 라인 λ§ˆλ‹€ Sequence<String> ν˜•νƒœλ‘œ μ ‘κ·Όν•  수 μžˆλ„λ‘ ν•΄μ£ΌλŠ” ν•¨μˆ˜μ΄λ‹€. λ‚΄λΆ€μ—μ„œ use λ₯Ό μ‚¬μš©ν•˜κ³  있기 λ•Œλ¬Έμ— λ³„λ„λ‘œ λ¦¬μ†ŒμŠ€ 정리λ₯Ό ν•  ν•„μš”κ°€ μ—†μ–΄ 맀우 νŽΈλ¦¬ν•˜λ‹€.
  • forEachIndexed : 순회λ₯Ό ν•˜λ©΄μ„œ κ°’κ³Ό index λ₯Ό ν•¨κ»˜ μ‚¬μš©ν•  수 있게 ν•΄μ£ΌλŠ” ν•¨μˆ˜μ΄λ‹€.
  • contains : νŠΉμ • 값을 ν¬ν•¨ν•˜κ³  μžˆλŠ”μ§€ 확인할 수 μžˆλ„λ‘ ν•΄μ£ΌλŠ” ν•¨μˆ˜μ΄λ‹€. μœ„ 3개의 ν•¨μˆ˜λ₯Ό μ‚¬μš©ν–ˆκ³ , 탐색을 톡해 κ²€μƒ‰ν•˜κ³ μž ν•˜λŠ” λ¬Έμžμ—΄μ΄ ν¬ν•¨λ˜μ–΄ 있으면 결과물을 λ‹΄λŠ” results 에 μ €μž₯ν•˜λ„λ‘ λ™μž‘ν•˜κ³  μžˆλ‹€.

λ§ˆμ§€λ§‰μœΌλ‘œ μ‚΄νŽ΄λ³Ό λ©”μ„œλ“œλŠ” 결과물을 좜λ ₯ν•΄μ£ΌλŠ” λ©”μ„œλ“œλ‹€.

1
2
3
4
5
6
7
8
private fun printResults() {
    results.keys.sorted().forEach { filePath ->
        println("File: $filePath")
        results[filePath]?.forEach { (lineNumber: Int, line: String) ->
            println("Line $lineNumber: $line")
        }
    }
}

searchInFile λ©”μ„œλ“œμ—μ„œ results 에 μ €μž₯ν•œ 결과물에 λŒ€ν•΄ λͺ¨λ“  탐색이 λλ‚˜κ³  ν•œλ²ˆμ— 좜λ ₯을 ν•΄μ£ΌλŠ” λ©”μ„œλ“œμ΄λ‹€. 결과물에 λŒ€ν•΄ 순차적으둜 λ³΄μ—¬μ§ˆ 수 μžˆλ„λ‘ sortedλ₯Ό ν™œμš©ν–ˆκ³ , results 에 μžˆλŠ” Paire<Int, String> 에 λŒ€ν•΄ lineNumber 와 line 을 좜λ ₯ν•˜λ„λ‘ λ§Œλ“€μ—ˆλ‹€.

μ—¬κΈ°κΉŒμ§€κ°€ μ‹±κΈ€ μŠ€λ ˆλ“œλ‘œ λ™μž‘ν•˜λŠ” KGrep 의 κ΅¬ν˜„ λ‚΄μš©μ΄λ‹€. 이걸 μ•žμœΌλ‘œ KGrepV1 으둜 λͺ…μΉ­ν•˜μž.
그럼 이제 μ„±λŠ₯ ν…ŒμŠ€νŠΈλ₯Ό μ§„ν–‰ν•΄λ³΄μž. λ¨Όμ € μ„±λŠ₯ ν…ŒμŠ€νŠΈλ₯Ό μœ„ν•΄ λͺ‡κ°€μ§€ μž‘μ—…μ΄ ν•„μš”ν•˜λ‹€.

  1. ν…ŒμŠ€νŠΈ 데이터 생성
  2. κ΅¬ν˜„ν•œ KGrepV1 을 μ»΄νŒŒμΌν•˜κ³  .jar 파일둜 λ§Œλ“€μž.
  3. λ§Œλ“  .jar νŒŒμΌμ„ μ‹€ν–‰μ‹œν‚€λŠ”λ° time λͺ…λ Ήμ–΄λ₯Ό 톡해 μ–΄λŠμ •λ„ μ‹œκ°„μ΄ μ†Œμš” λ˜μ—ˆλŠ”μ§€ ν™•μΈν•΄λ³΄μž.
  4. grep κ³Ό KGrepV1 을 각각 μ‹€ν–‰ μ‹œν‚€κ³  μ„±λŠ₯을 λΉ„κ΅ν•΄λ³΄μž.

ν…ŒμŠ€νŠΈ λ°μ΄ν„°λŠ” μ•„λž˜μ™€ κ°™λ‹€.(μ—¬λŸ¬ 파일 쀑 총 6개의 파일만이 β€œkeyword” λΌλŠ” 단어λ₯Ό κ°–κ³  μžˆλ‹€.)

1
2
3
4
5
6
7
8
9
10
11
test-data/
β”œβ”€β”€ test.json
β”œβ”€β”€ dir_0/
β”‚   β”œβ”€ file_0.txt # 100λ§Œμ€„μ˜ ν…μŠ€νŠΈ 데이터
β”‚   ...
β”‚   └─ file_9.txt
...
└── dir_9/
     β”œβ”€ file_0.txt
     ...
     └─ file_9.txt

μ„±λŠ₯ 비ꡐ κ²°κ³ΌλŠ” μ•„λž˜μ™€ κ°™λ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
kotlinc KGrepV1.kt -include-runtime -d KGrepV1.jar # .jar 파일 생성
time java -jar KGrepV1.jar "keyword" "test-data"   # time 을 ν†΅ν•œ μ„±λŠ₯ μΈ‘μ •

# result (m3 pro 12 core 12 thread, RAM 18GB κΈ°μ€€)
## KGrepV1
File: test-data/dir_1/file_0.txt
Line 501: Line 500 in file_0.txt of dir_1. Contains keyword.
File: test-data/dir_3/file_7.txt
Line 501: Line 500 in file_7.txt of dir_3. Contains keyword.
File: test-data/dir_4/file_2.txt
Line 501: Line 500 in file_2.txt of dir_4. Contains keyword.
File: test-data/dir_5/file_5.txt
Line 501: Line 500 in file_5.txt of dir_5. Contains keyword.
File: test-data/dir_8/file_0.txt
Line 501: Line 500 in file_0.txt of dir_8. Contains keyword.
File: test-data/test.json
Line 2: "name": "keyword"
java -jar KGrepV1.jar "keyword" "test-data"  3.37s user 0.81s system 95% cpu 4.365 total

## grep
test-data/dir_5/file_5.txt:Line 500 in file_5.txt of dir_5. Contains keyword.
test-data/dir_4/file_2.txt:Line 500 in file_2.txt of dir_4. Contains keyword.
test-data/dir_3/file_7.txt:Line 500 in file_7.txt of dir_3. Contains keyword.
test-data/test.json:  "name": "keyword"
test-data/dir_1/file_0.txt:Line 500 in file_0.txt of dir_1. Contains keyword.
test-data/dir_8/file_0.txt:Line 500 in file_0.txt of dir_8. Contains keyword.
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox,.venv,venv  8.81s user 0.36s system 99% cpu 9.245 total

KGrepV1 κ³Ό grep 비ꡐ κ²°κ³Ό

ν•­λͺ©KGrepV1.jargrep
μ‚¬μš©μž μ‹œκ°„ (user)3.37s8.81s
μ‹œμŠ€ν…œ μ‹œκ°„ (system)0.81s0.36s
CPU μ‚¬μš©λ₯ 95%99%
총 μ†Œμš” μ‹œκ°„4.365s9.245s

KGrepV1 이 λ¦¬μ†ŒμŠ€λŠ” 적게 μ‚¬μš©ν•˜μ§€λ§Œ, λΉ λ₯΄κ²Œ μ²˜λ¦¬ν•˜λŠ” λͺ¨μŠ΅μ„ λ³Ό 수 μžˆμ—ˆλ‹€. KGrepV1 κ³Ό grep λͺ¨λ‘ CPU λ₯Ό 100% λ‚΄μ™Έλ‘œ μ‚¬μš©ν•˜κ³  μžˆλŠ” λͺ¨μŠ΅μ΄λ‹€.(μ€‘μš”!)

이제 여기에 λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό λ„μž…ν•˜μ—¬ μ„±λŠ₯을 κ°œμ„ μ‹œμΌœ 보자.

KGrepV1 의 전체 μ½”λ“œλŠ” μ•„λž˜μ™€ κ°™λ‹€. κ°„λ‹¨ν•œ μ˜ˆμ™Έ μ²˜λ¦¬λ“±μ„ μΆ”κ°€ν–ˆλ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
package com.crispinlab.kgrep

import java.io.File
import java.util.LinkedList
import java.util.Queue

class KGrepV1(
    private val keyword: String,
    private val path: String
) {

    private val filesToProcess: Queue<File> = LinkedList()
    private val results = HashMap<String, List<Pair<Int, String>>>()

    fun search() {
        val startPath = File(path)
        collectFiles(startPath)
        val fileCount: Int = filesToProcess.size
        for (count: Int in 0 until fileCount) {
            val file: File = filesToProcess.poll() ?: break
            searchInFile(file)
        }
        printResults()
    }

    private fun collectFiles(startPath: File) {
        when {
            startPath.isFile -> filesToProcess.add(startPath)
            startPath.isDirectory -> startPath.listFiles()?.forEach { collectFiles(it) }
            else -> println("Cannot process this path: ${startPath.absolutePath}")
        }
    }

    private fun searchInFile(file: File) {
        if (!file.isFile || !file.canRead()) return
        val matchingLines: MutableList<Pair<Int, String>> = mutableListOf()
        file.useLines { lines ->
            lines.forEachIndexed { index, line ->
                if (line.contains(keyword)) {
                    matchingLines.add((index + 1) to line.trim())
                }
            }
        }
        if (matchingLines.isNotEmpty()) {
            results[file.path] = matchingLines
        }
    }

    private fun printResults() {
        results.keys.sorted().forEach { filePath ->
            println("File: $filePath")
            results[filePath]?.forEach { (lineNumber: Int, line: String) ->
                println("Line $lineNumber: $line")
            }
        }
    }
}

fun main(args: Array<String>) {
    if (args.size < 2) {
        println("Usage: kgrep <keyword> <relative path>")
        return
    }

    val keyword: String = args[0]
    val path: String = args[1]

    val kGrep = KGrepV1(keyword, path)
    kGrep.search()
}

λ©€ν‹° μŠ€λ ˆλ“œ

μ‹±κΈ€ μŠ€λ ˆλ“œλ‘œ κ΅¬ν˜„ν•œ KGrepV1 μ—μ„œ λ³€κ²½λœ λΆ€λΆ„λ§Œ λΉ λ₯΄κ²Œ μ‚΄νŽ΄λ³΄μž.

μš°μ„  μ‚¬μš©ν•˜λ˜ μžλ£Œκ΅¬μ‘°λ“€μ„ λ©€ν‹° μŠ€λ ˆλ“œ ν™˜κ²½μ—μ„œ μ‚¬μš©κ°€λŠ₯ν•˜λ„λ‘ μ§€μ›ν•΄μ£ΌλŠ” 자료ꡬ쑰둜 λ³€κ²½ν–ˆλ‹€.

1
2
private val results = ConcurrentHashMap<String, List<Pair<Int, String>>>()
private val filesToProcess = ConcurrentLinkedQueue<File>()

μ½”ν‹€λ¦°μ—μ„œ λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν•˜λŠ” λ°©λ²•μ—λŠ” μ—¬λŸ¬κ°€μ§€ 방법이 μžˆλŠ”λ° λ‚˜λŠ” java μ—μ„œ μ§€μ›ν•˜λ©΄μ„œ kotlin μ—μ„œ λ°”λ‘œ μ‚¬μš© κ°€λŠ₯ν•œ ExcutorService api λ₯Ό μ‚¬μš©ν–ˆλ‹€. λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν•¨μ— μžˆμ–΄ μŠ€λ ˆλ“œ 관리가 μ€‘μš”ν•œλ° λ‚΄λΆ€μ μœΌλ‘œ κ·Έκ±Έ ν•΄μ£Όλ©΄μ„œλ„ thread pool 을 μ‚¬μš©ν•  수 μžˆμ–΄ μ„ νƒν•˜κ²Œ λ˜μ—ˆλ‹€.

thread pool 의 κ°œμˆ˜λŠ” ν˜„μž¬ runtime μ—μ„œμ˜ μ‚¬μš©κ°€λŠ₯ν•œ CPU μ½”μ–΄ 수 만큼 지정해쀬닀.

1
2
3
private val threadPool: ExecutorService = Executors.newFixedThreadPool(
    Runtime.getRuntime().availableProcessors()
)

그리고 μœ„μ— μ„ μ–Έν•œ thread pool 을 μ‚¬μš©ν•˜κΈ° μœ„ν•΄ search λ©”μ„œλ“œλ₯Ό 변경해쀬닀.
μ—¬κΈ°μ„œλ„ κ³ λ―Όλ˜λŠ” 뢀뢄이 μžˆμ—ˆλŠ”λ°, thread λ₯Ό μ–΄λ–€ λ‹¨μœ„λ‘œ ν• λ‹Ήν•  것인가 에 λŒ€ν•œ κ³ λ―Όμ΄μ˜€λ‹€.
μ—¬κΈ°μ„œ λ‚˜λŠ” ꡳ이 ν•˜λ‚˜μ˜ νŒŒμΌμ„ μ—¬λŸ¬ thread μ—μ„œ μ‘°νšŒν•  ν•„μš”λŠ” 없을거라 μƒκ°ν–ˆλ‹€. λ¬Όλ‘  큰 파일이 λ§Žμ€ 경우라면 ν•˜λ‚˜μ˜ νŒŒμΌμ„ λ‘κ°œ μ΄μƒμ˜ thread λ₯Ό μ‚¬μš©ν•˜μ—¬ νƒμƒ‰ν•˜λŠ” 방법도 μžˆμ„κ²ƒ 같은데 일단 κ΅¬ν˜„ μžμ²΄κ°€ 맀우 λ³΅μž‘ν•  수 있고(λ™μ‹œμ„± 처리 λ“±), μ²˜μŒμ—λŠ” κ°„λ‹¨ν•˜κ³  λΉ λ₯΄κ²Œ κ΅¬ν˜„ν•˜λŠ”κ²Œ 먼저라고 μƒκ°ν–ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
κ·Έλž˜μ„œ λ‚˜λŠ” file λ‹¨μœ„λ‘œ thread λ₯Ό ν• λ‹Ήν•˜λŠ” 방식을 μ„ νƒν–ˆλ‹€.
이미 file λͺ©λ‘μ„ Queue 에 λ‹΄κ³  μžˆμ—ˆκΈ° λ•Œλ¬Έμ— ν•΄λ‹Ή Queue μ—μ„œ file 을 ν•˜λ‚˜μ”© κΊΌλ‚΄λ©΄μ„œ ν•΄λ‹Ή file 을 νƒμƒ‰ν• λ•Œ thread λ₯Ό 각각 ν• λ‹Ή ν•˜λŠ” λ°©μ‹μœΌλ‘œ μž‘μ—…ν•˜λ©΄ κ°„λ‹¨ν•˜κ²Œ λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν•  수 μžˆμ„κ±°λΌ μƒκ°ν–ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
μ•„λž˜λŠ” μ„ νƒν•œ λ°©μ‹λŒ€λ‘œ λ™μž‘ν•˜κΈ° μœ„ν•΄ λ³€κ²½ν•œ search λ©”μ„œλ“œλ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fun search() {
    val startPath = File(path)

    if (!startPath.exists()) {
        println("Path does not exist: $path")
        return
    }

    collectFiles(startPath)

    val fileCount: Int = filesToProcess.size
    for (count: Int in 0 until fileCount) {
        val file: File = filesToProcess.poll() ?: break
        threadPool.submit { searchInFile(file) }
    }
    threadPool.shutdown()
    threadPool.awaitTermination(1, TimeUnit.HOURS)
    printResults()
}

λ™μž‘ 방식은 μ•„λž˜μ™€ κ°™λ‹€.

  1. threadPool.submit 을 톡해 searchInFile λ©”μ„œλ“œλ₯Ό 각각의 thread μ—μ„œ λ™μž‘ν•˜λ„λ‘ ν• λ‹Ήν•œλ‹€.
  2. threadPool.shutdown 을 톡해 λͺ¨λ“  thread pool 에 μžˆλŠ” thread κ°€ μž‘μ—…μ„ λ§ˆμΉ λ•Œ κΉŒμ§€ κΈ°λ‹€λ¦°λ‹€.
  3. λ§Œμ•½μ„ λŒ€λΉ„ν•˜μ—¬ threadPool.awaitTermination 을 톡해 timeout 을 μ„€μ •
  4. threadPool 에 μžˆλŠ” λͺ¨λ“  thread κ°€ μž‘μ—…μ„ 마치면 결과물을 좜λ ₯ν•œλ‹€.

그럼 이제 KGrepV2 에 λŒ€ν•œ μ„±λŠ₯ ν…ŒμŠ€νŠΈλ₯Ό μ§„ν–‰ν•΄ 보자. ν…ŒμŠ€νŠΈ 데이터와 μΈ‘μ • 방식을 KGrepV1 을 μΈ‘μ • ν–ˆμ„λ•Œμ™€ κ°™λ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
kotlinc KGrepV2.kt -include-runtime -d KGrepV2.jar # .jar 파일 생성
time java -jar KGrepV2.jar "keyword" "test-data"   # time 을 ν†΅ν•œ μ„±λŠ₯ μΈ‘μ •

# result (m3 pro 12 core 12 thread, RAM 18GB κΈ°μ€€)
## KGrepV1
File: test-data/dir_1/file_0.txt
Line 501: Line 500 in file_0.txt of dir_1. Contains keyword.
File: test-data/dir_3/file_7.txt
Line 501: Line 500 in file_7.txt of dir_3. Contains keyword.
File: test-data/dir_4/file_2.txt
Line 501: Line 500 in file_2.txt of dir_4. Contains keyword.
File: test-data/dir_5/file_5.txt
Line 501: Line 500 in file_5.txt of dir_5. Contains keyword.
File: test-data/dir_8/file_0.txt
Line 501: Line 500 in file_0.txt of dir_8. Contains keyword.
File: test-data/test.json
Line 2: "name": "keyword"
java -jar KGrepV1.jar "keyword" "test-data"  3.33s user 0.81s system 94% cpu 4.376 total

## KGrepV2
File: test-data/dir_1/file_0.txt
Line 501: Line 500 in file_0.txt of dir_1. Contains keyword.
File: test-data/dir_3/file_7.txt
Line 501: Line 500 in file_7.txt of dir_3. Contains keyword.
File: test-data/dir_4/file_2.txt
Line 501: Line 500 in file_2.txt of dir_4. Contains keyword.
File: test-data/dir_5/file_5.txt
Line 501: Line 500 in file_5.txt of dir_5. Contains keyword.
File: test-data/dir_8/file_0.txt
Line 501: Line 500 in file_0.txt of dir_8. Contains keyword.
File: test-data/test.json
Line 2: "name": "keyword"
Search completed in 1091 ms
java -jar KGrepV2.jar "keyword" "test-data"  8.26s user 1.40s system 832% cpu 1.162 total

## grep
test-data/dir_5/file_5.txt:Line 500 in file_5.txt of dir_5. Contains keyword.
test-data/dir_4/file_2.txt:Line 500 in file_2.txt of dir_4. Contains keyword.
test-data/dir_3/file_7.txt:Line 500 in file_7.txt of dir_3. Contains keyword.
test-data/test.json:  "name": "keyword"
test-data/dir_1/file_0.txt:Line 500 in file_0.txt of dir_1. Contains keyword.
test-data/dir_8/file_0.txt:Line 500 in file_0.txt of dir_8. Contains keyword.
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox,.venv,venv  9.12s user 0.41s system 98% cpu 9.685 total

KGrepV1,KGrepV2, grep 비ꡐ κ²°κ³Ό

ν•­λͺ©KGrepV1.jarKGrepV2.jargrep
μ‚¬μš©μž μ‹œκ°„ (user)3.33s8.26s9.12s
μ‹œμŠ€ν…œ μ‹œκ°„ (system)0.81s1.40s0.41s
CPU μ‚¬μš©λ₯ 94%832%98%
총 μ†Œμš” μ‹œκ°„4.376s1.162s9.685s

KGrepV2 κ°€ 속도 μΈ‘λ©΄μ—μ„œλŠ” 맀우 λΉ λ₯Έκ±Έ λ³Ό 수 μžˆλ‹€.
λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν–ˆκΈ° λ•Œλ¬Έμ— CPU λ₯Ό 100% 이상 μ‚¬μš©ν•˜λŠ”κ±Έ λ³Ό 수 μžˆλ‹€.
ν™•μ‹€νžˆ λ©€ν‹° μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν•˜λ‹ˆ 속도적인 λΆ€λΆ„μ—μ„œλŠ” 맀우 κ°œμ„ λœκ±Έ λ³Ό 수 μžˆλ‹€. λ‹€λ§Œ, λ¦¬μ†ŒμŠ€λŠ” κ·Έ 만큼 μ‚¬μš©ν•˜μ§€λ§Œ..

KGrepV2 의 전체 μ½”λ“œλŠ” μ•„λž˜μ™€ κ°™λ‹€.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
package com.crispinlab.kgrep

import java.io.File
import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.ExecutorService
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit
import kotlin.system.measureTimeMillis

class KGrepV2(
    private val keyword: String,
    private val path: String
) {
    private val results = ConcurrentHashMap<String, List<Pair<Int, String>>>()

    private val filesToProcess = ConcurrentLinkedQueue<File>()

    private val threadPool: ExecutorService = Executors.newFixedThreadPool(
        Runtime.getRuntime().availableProcessors()
    )

    fun search() {
        val startPath = File(path)

        if (!startPath.exists()) {
            println("Path does not exist: $path")
            return
        }

        collectFiles(startPath)

        val time: Long = measureTimeMillis {
            val fileCount: Int = filesToProcess.size
            for (count: Int in 0 until fileCount) {
                val file: File = filesToProcess.poll() ?: break
                threadPool.submit { searchInFile(file) }
            }
            threadPool.shutdown()
            threadPool.awaitTermination(1, TimeUnit.HOURS)
        }
        printResults()
        println("Search completed in $time ms")
    }

    private fun collectFiles(startPath: File) {
        when {
            startPath.isFile -> filesToProcess.add(startPath)
            startPath.isDirectory -> startPath.listFiles()?.forEach { collectFiles(it) }
            else -> println("Cannot process this path: ${startPath.absolutePath}")
        }
    }

    private fun searchInFile(file: File) {
        try {
            if (!file.isFile || !file.canRead()) return
            val matchingLines: MutableList<Pair<Int, String>> = mutableListOf()
            file.useLines { lines ->
                lines.forEachIndexed { index, line ->
                    if (line.contains(keyword)) {
                        matchingLines.add((index + 1) to line.trim())
                    }
                }
            }
            if (matchingLines.isNotEmpty()) {
                results[file.path] = matchingLines
            }
        } catch (e: Exception) {
            System.err.println("Error processing file ${file.path}: ${e.message}")
        }
    }

    private fun printResults() {
        results.keys.sorted().forEach { filePath ->
            println("File: $filePath")
            results[filePath]?.forEach { (lineNumber: Int, line: String) ->
                println("Line $lineNumber: $line")
            }
        }
    }
}

fun main(args: Array<String>) {
    if (args.size < 2) {
        println("Usage: kgrep <keyword> <relative path>")
        return
    }

    val keyword: String = args[0]
    val path: String = args[1]

    val kGrep = KGrepV2(keyword, path)
    kGrep.search()
}

πŸ’‘ 마무리

μ˜€λŠ˜μ€ 라이브 μ½”λ”© ν…ŒμŠ€νŠΈ μ˜ˆμ‹œ 문제λ₯Ό ν’€μ–΄λ΄€λ‹€.
μ›Ή κ°œλ°œμ„ ν•˜λ‹€λ³΄λ©΄ μš”κ΅¬μ‚¬ν•­μ— λ§žλŠ” API λ₯Ό 많이 κ΅¬ν˜„ν•˜μ§€, μ΄λ ‡κ²Œ μš”κ΅¬μ‚¬ν•­μ— λ§žλŠ” μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ„ κ΅¬ν˜„ν•  일이 생각보닀 λ§Žμ§€ μ•Šμ€λ° μ˜€λžœλ§Œμ— μ΄λŸ°μ €κ±΄ 고민을 ν•˜λ©΄μ„œ μš”κ΅¬μ‚¬ν•­μ— λ§žλŠ” μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ„ κ΅¬ν˜„ν•˜λ‹ˆ μž¬λ―Έμžˆμ—ˆλ‹€.
이번 κΈ°νšŒμ— 파일 처리 및 λ©€ν‹° μŠ€λ ˆλ“œμ— λŒ€ν•΄ 닀뀄볼 수 μžˆμ–΄ κ²½ν—˜μ μœΌλ‘œ 많이 도움이 λ˜μ—ˆλ‹€.

속도가 λΉ λ₯΄λ‹€κ³  ν•΄μ„œ 무쑰건 쒋은 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ„ μ•„λ‹ˆλ‹€. λ¦¬μ†ŒμŠ€λ₯Ό ν•œμ •μ μœΌλ‘œ μ‚¬μš©ν•΄μ•Ό ν•˜λŠ” ν™˜κ²½μ—μ„œ KGrepV2 λŠ” 그리 쒋은 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ΄λΌ ν•  수 없을거닀. λ‹€λ§Œ 기쑴에 많이 μ‚¬μš©ν•˜λ˜ grep 의 속도적인 ν•œκ³„λ₯Ό μ–΄λ–»κ²Œ 극볡할 수 μžˆμ„μ§€μ— λŒ€ν•΄ 문제λ₯Ό 톡해 μž μ‹œλ‚˜λ§ˆ 생각할 수 μžˆμ–΄μ„œ λ„ˆλ¬΄ μ’‹μ•˜λ‹€.

κΈ°νšŒκ°€ λœλ‹€λ©΄ ν•΄λ‹Ή 문제λ₯Ό λ‹€λ₯Έ κ°œλ°œμžλΆ„λ“€μ€ μ–΄λ–»κ²Œ ν’€μ—ˆλŠ”μ§€ λ‚΄κ°€ κ΅¬ν˜„ν•œ λ‚΄μš©κ³ΌλŠ” μ–΄λ–€ 뢀뢄이 λ‹€λ₯Όμ§€ 비ꡐ해보고 μ‹Άλ‹€.


πŸ”– REFERENCE

This post is licensed under CC BY 4.0 by the author.